BCBSM, Inc. v. Walgreen Co.
BCBSM, Inc. v. Walgreen Co.
2023 WL 6852533 (N.D. Ill. 2023)
July 21, 2023

Grossman, Maura R.,  Special Master

Sampling
Technology Assisted Review
Special Master
Proportionality
Attorney Work-Product
ESI Protocol
Download PDF
To Cite List
Summary
The court ordered the parties to adhere to a validation protocol to determine the adequacy of the production of ESI. This protocol involves the production of a sample of 3,000 documents, to be reviewed and coded by a subject matter expert (SME). The ESI is important to the case as it is used to calculate the summary measures for Recall and Precision, as well as the comparative statistics.
Additional Decisions
BCBSM, INC., et al., Plaintiffs/Counter-Defendants,
v.
WALGREEN CO. & Walgreens Boots Alliance, Inc., Defendants/Counter-Plaintiffs.
Walgreen Co. & Walgreens Boots Alliance, Inc., Third-Party Plaintiffs,
v.
Prime Therapeutics, LLC, Third-Party Defendant
No. 20 C 01853, Related Cases: Case No. 1:20-cv-04738, Case No. 1:20-cv-03332, Case No. 1:20-cv01929, Case No. 1:20-cv-04940, Case No. 1:20-cv-01362
United States District Court, N.D. Illinois, Eastern Division
Filed July 21, 2023
Grossman, Maura R., Special Master

SPECIAL MASTER’S ORDER ON PRELIMINARY DIAGNOSTIC PROTOCOL TO ASSESS THE ADEQUACY OF INITIAL PLAINTIFFS’ ORIGINAL AND SUPPLEMENTAL PRODUCTIONS

I. PREAMBLE AND PROCEDURAL HISTORY

Pursuant to discovery disputes that arose between certain of the Parties concerning the adequacy of the 28 Initial Plaintiffs’[1] document collections and productions, on May 2, 2023, the Court appointed a Special Master “to address and resolve all issues raised in Defendants’ [referring to Walgreen Co. and Walgreens Boot Alliance, Inc.’s] Motion to Compel Document Discovery and Related Metrics [Dkt. 374], including asserted issues related to (A) the Initial Plaintiffs’ use of technology assisted review (TAR) and (B) the Initial Plaintiffs’ collection and production of documents from additional custodians.” Order Appointing Special Master [Dkt. 450], at 2.

The Special Master first met with the Parties on May 15, 2023, by videoconference, for an initial introductory session. After hearing from the Parties, and at the Initial Plaintiffs’ request, the Special Master agreed that after she finished reviewing the Parties’ briefs and the accompanying declarations and exhibits that were filed with the Court, she would prepare a list of questions as to which the Initial Plaintiffs would consider responding in a live session (as opposed to in writing). After reviewing the briefs and related materials,[2] however, the Special Master emailed the parties on May 22, 2023 to suggest what she thought would be a better (i.e., more effective and efficient) way to proceed. Rather than starting the process with a series of questions about the Initial Plaintiffs’ TAR process, the Special Master proposed that the Parties consider beginning with “a fresh validation protocol that [she] would develop (in consultation with the Parties) which would be designed to identify a problem [with the Initial Plaintiffs’ production(s)], if there is one, and to show where it is if it exists.” The Special Master attached to her email two sample validation protocols that she had developed and used in prior cases, one from Rio Tinto plc v. Vale S.A. et al., Case No. 1:14-cv-03042-RMB-AJP [Dkt. 338] (S.D.N.Y. Sept. 8, 2015) (“Rio Tinto”) (attached hereto as Ex. 1), and the other from In re Broiler Chicken Antitrust Litigation, Case No. 1:16-cv-08637 [Dkt. 586] (N.D. Ill. Jan. 3, 2018) (“Broilers”) (attached hereto as Ex. 2). She requested that the parties review the two protocols and advise her as to (i) whether they agreed that the suggested approach of gathering some empirical data first, would be a better way to proceed for now, and (ii) if so, whether they preferred the Rio Tinto Protocol, the Broilers Protocol, or some combination of both.

The Initial Plaintiffs responded by email on May 26, 2023, stating that they fully agreed that “a validation process is an efficient and appropriate way to proceed,” and that they favored the adoption of certain aspects (although not all) of the Broilers Protocol. Defendants also responded on May 26. They also agreed that “a renewed validation should be conducted,” but indicated that there were still certain “gating questions” that the Initial Plaintiffs should be required to answer before proceeding to such a validation protocol. Defendants expressed a preference for certain elements of the Rio Tinto Protocol.

The Special Master met with the Parties again by videoconference on June 13, 2023. The Special Master explained the reasoning behind her proposed approach and responded to issues raised by the Parties in their May 26 emails. Both Parties argued their positions at length and the Special Master presented her preliminary view on how things should proceed, all of which were captured in detailed Minutes of the June 13, 2023 Conference with Special Master Maura R. Grossman (attached hereto as Ex. 3). In brief, the Special Master expressed concern that Defendants’ “gating questions” implicated work product, which the Special Master was loath to pierce based on the present factual record. The Special Master expressed her preference to proceed with certain questions that relate solely to the Initial Plaintiffs’ prevalence estimate(s) and a validation exercise that was similar to the one undertaken in Broilers because it would help to diagnose the presumed problem (assuming there is one) and provide a path forward. In response to this, Defendants requested and were granted the opportunity to present an alternative, slightly modified validation protocol, which would not invade the Initial Plaintiffs’ work product, before the Special Master made her final ruling. Defendants’ [Proposed] Draft Stipulated Interim Diagnostic Protocol Regarding Initial Plaintiffs’ Collection and Use of TAR (“Defendants’ Draft Protocol”) was submitted to Defendants and the Special Master on June 20, 2023, along with a letter brief addressing the Defendants’ Draft Protocol and several other issues. A copy of the Defendants’ Draft Protocol (but not the letter brief and the extensive exhibits accompanying it) is attached hereto as Ex. 4.

On June 21, 2023, the Initial Plaintiffs requested and were granted leave to respond to the Defendants’ Draft Protocol on June 27, 2023, and the other issues raised in Defendants’ June 20 letter brief on July 7. In their June 27 letter response regarding the Defendants’ Draft Protocol (at 1-2), Plaintiffs agree to provide certain unspecified “relevant prevalence metrics,” and to undertake the Broilers Protocol, as proposed by the Special Master, but otherwise reject Defendants’ Draft Protocol in its entirety as “a protracted and unreasonably extensive exercise that would require [the Initial] Plaintiffs to provide unnecessary (and in some cases privileged) information about collection and other aspects of the process not at issue, and then review and catalogue tens of thousands of documents, as an ‘initial diagnostic protocol’ – a predicate to validation – with numerous meet and confers and consultations with the special master between steps,” essentially designed to “generate unnecessary disputes and wasteful make-work until at least next year.” They decry the Defendants’ Draft Protocol (at 2) as “extraordinarily complicated” and “not only impos[ing] unnecessary and significant cost and burden on [the initial] Plaintiffs, but [also delaying] any meaningful step toward resolution of this dispute for many months because it will not generate information relevant to the ultimate issue.” (emphasis in original).

In addition to the briefing and accompanying materials, as well as the correspondence and oral arguments in connection with this dispute, the Special Master carefully reviewed the Parties’ submissions related to the Defendants’ Draft Protocol. She agrees with the Initial Plaintiffs that what Defendants are seeking is unduly burdensome, overly complex, continues to unnecessarily invade the Initial Plaintiffs’ work product, and will not facilitate the timely resolution of this dispute. By way of example of burden only, if the Defendants’ Draft Protocol were to be completed twice for each Initial Plaintiff, as Defendants contemplate, at the level of confidence requested by them, that, alone, would appear to require the review of 86,072 documents, simply for the purpose of confirming that there is a problem with the TAR process and trying to determine where it might be. There is no need to do that much work, or to invade the Initial Plaintiffs’ work product—for that matter—in order to diagnose what appears to be a problem with Initial Plaintiffs’ production(s) and to devise a remediation plan. While it is conceivable that the Special Master may ultimately end up requiring responses to some of Defendants’ “gating questions” (and/or other questions), we are simply not there yet, and Defendants are putting the cart before the horse. Therefore, without prejudice to Defendants’ right to request (and the Initial Plaintiffs’ right to challenge the need for) additional inquiry into Plaintiffs’ TAR and related identification, collection, search, and/or review processes, at a later date, the Special Master has developed a more reasonable and proportionate initial diagnostic process that does not invade work product, and that will assist the Parties and the Special Master in moving forward. Accordingly, the Initial Plaintiffs will undertake the Special Master’s Preliminary Diagnostic Protocol to Assess the Adequacy of the Initial Plaintiffs’ Original and Supplemental Productions (the “Preliminary Diagnostic Protocol” or “PDP”) to gather meaningful, objective information that will allow an overall assessment of the quality of the Initial Plaintiffs’ productions to date, and if there are issues with the Initial Plaintiffs’ productions, should help to determine what and where those issues are, allowing for targeted remediation efforts.

After emailing the parties a near-final draft of the Preliminary Draft Protocol on July 10, 2023, the Special Master met a third time with the parties, by videoconference, on July 13, 2023, to review the PDP, including the logistics for implementing it and the rationale behind the various parts. The Special Master also addressed several questions raised by the Initial Plaintiffs seeking clarification about various requirements in the PDP and took several others under advisement.

The following Preliminary Diagnostic Protocol resolves the Initial Plaintiffs’ questions, as well as certain concerns expressed by Defendants. It consists of four parts, three of which will be completed in full by the Initial Plaintiffs: (i) the Prevalence Exercise described in §II below, (ii) the Stratified Sampling (a.k.a., “Broilers”) Exercise described in §III below, and (iii) the Declaration described in §IV below. The deliverables set forth in these three sections shall be provided to the Defendants and the Special Master no later than five weeks from the date of entry of this Order, unless additional time is granted by the Special Master for good cause shown. Thereafter, the Special Master will perform the calculations set forth in the Appendix to this Order and will supply the results of those calculation to the Parties for their review and comment.

II. THE PREVALENCE EXERCISE

1. Indicate the deduplicated number of documents contained in the original, pre-August 22, 2022 collection that was subject to technology-assisted review (“TAR”) (the “Original TAR Collection”).

2. Indicate the deduplicated number of documents in the Original TAR Collection provided by each of the 13 Initial Plaintiff Groups,[3] listed separately by name.

3. Indicate the size of the sample that was drawn from the Original TAR Collection by the Initial Plaintiffs to derive their initial prevalence estimate.

4. Indicate how the sample used to derive the Initial Plaintiffs’ initial prevalence estimate was drawn (e.g., by random sampling or otherwise).

5. Indicate the initial prevalence estimate that was derived by the Initial Plaintiffs based on this sample, expressed in terms of the number of documents that were estimated to be responsive in the Original TAR Collection.

6. Indicate how many documents in the sample used to derive the initial prevalence estimate by Initial Plaintiffs came from each Initial Plaintiff Group, listed separately by name.

7. Prepare a table (in the form of spreadsheet) like the example below, indicating the following information for each document contained in the sample used to derive the initial prevalence estimate by the Initial Plaintiffs:

A. The Bates Number (if the document was produced) or a Unique Control/Document Identifier (if the document was not produced).

B. The name of the Initial Plaintiff Group that was the source of the document.

C. Whether the document was coded responsive and not privileged (“R&NP”).

D. Whether the document was coded responsive and privileged (“R&P”).

E. Whether the document was coded non-responsive (“NR”).


8. Indicate the total deduplicated number of documents contained in the original, pre-August 22, 2022 collection that were excluded from the Original TAR Collection (if any) (in other words, the total deduplicated number of documents that were not subject to TAR).

9. Indicate the deduplicated number of documents contained in the original, pre-August 22, 2022 collection that were excluded from the Original TAR Collection (if any), for each Initial Plaintiff Group, listed separately by name.

10. Indicate how these documents not subject to TAR (if any) were reviewed for responsiveness and privilege.

11. Indicate the total deduplicated number of documents from the Original TAR Collection that were produced by the Initial Plaintiffs pre-August 22, 2022.

12. Indicate for each Initial Plaintiff Group, listed separately by name, the deduplicated number of documents from the Original TAR Collection that were reviewed by a human and coded responsive and not privileged.

13. Indicate for each Initial Plaintiff Group, listed separately by name, the deduplicated number documents from the Original TAR Collection reviewed by a human and coded responsive and privileged.

14. Indicate for each Initial Plaintiff Group, listed separately by name, the deduplicated number of documents from the Original TAR Collection that were reviewed by a human and coded non-responsive.

15. Indicate the total deduplicated number of documents that were added to the Original TAR Collection post-August 22, 2022, not including the CVS Documents[4] (the “Supplemental TAR Collection”).

16. Indicate the deduplicated number of documents that were added to the Original TAR Collection post-August 22, 2022, for each Initial Plaintiff Group, listed separately by name, not including the CVS Documents.

17. Indicate whether a new prevalence estimate was derived for the Supplemental TAR Collection and/or the Original TAR Collection plus the Supplemental TAR Collection.

18. If a new prevalence estimate was undertaken, indicate:

A. A description of the collection from which the new prevalence estimate was derived (e.g., the Supplemental TAR Collection Only, the Original TAR Collection plus the Supplemental TAR Collection, or otherwise).

B. The size of the collection from which the new sample was drawn.

C. A description of how the new sample was drawn (e.g., by random sampling or otherwise)

D. The size of the new sample that was drawn.

E. The new prevalence estimate that was derived from this new sample, expressed in terms of the number of documents that were estimated to be responsive in the Supplemental TAR Collection, the Original TAR Collection plus the Supplemental TAR Collection, or otherwise.

F. The number of documents in the new sample that came from each Initial Plaintiff Group, listed separately by name.

G. Prepare a table (in spreadsheet form) like the one described and shown in ¶7 above, indicating the same information for each document contained in the new sample.

19. Indicate the total deduplicated number of documents either collected or added post-August 22, 2022, that were not included in the Supplemental TAR Collection (in other words, the total deduplicated number of documents that were not subject to TAR). Do not count or include the CVS Documents for this purpose.

20. Indicate the number of documents collected or added post-August 22, 2022 from each Initial Plaintiff Group, listed separately by name, and not included in the Supplemental TAR Collection (if any). Do not count or include the CVS Documents for this purpose.

21. Indicate how these documents not subject to TAR (if any) were reviewed for responsiveness and privilege.

22. Indicate the total number of CVS Documents produced to Defendants that were previously produced to Defendants as a result of the Initial Plaintiffs’ TAR Process.

23. Indicate the total number of CVS Documents produced to Defendants that were not previously produced to Defendants as a result of the Initial Plaintiffs’ TAR process.

24. Indicate for each Initial Plaintiff Group, listed separately by name, the number of CVS Documents that were previously produced by the Initial Plaintiffs as a result of their TAR process.

25. Indicate for each Initial Plaintiff Group, listed separately by name, the number of CVS Documents that were not previously produced by the Initial Plaintiffs as a result of their TAR Process.

26. Indicate the total deduplicated number of documents from the Original TAR Collection that were produced to Defendants post-August 22, 2022.

27. Indicate the total deduplicated number of documents from the Supplemental TAR Collection that were produced to Defendants post-August 22, 2022.

28. Indicate for each Initial Plaintiff Group, listed separately by name, the deduplicated number of documents from the Original TAR Collection were produced post-August 22, 2022.

29. Indicate for each Initial Plaintiff Group, listed separately by name, the deduplicated number of documents from the Supplemental TAR collection were produced post-August 22, 2022.

30. Indicate the total number of documents from the Original TAR Collection that were reviewed by a human post-August 22, 2022, and coded responsive and not privileged.

31. Indicate for each Initial Plaintiff Group, listed separately by name, the number of Documents from the Original TAR Collection that were reviewed by a human post-August 22, 2022, and coded responsive and privileged.

32. Indicate for each Initial Plaintiff Group, listed separately by name, the number of documents from the Original TAR Collection that reviewed by a human post-August 22, 2022, and coded non-responsive.

33. Indicate the total number of documents from the Supplemental TAR Collection that were reviewed by a human post-August 22, 2022, and coded responsive and not privileged.

34. Indicate for each Initial Plaintiff Group, listed separately by name, the number of Documents from the Supplemental TAR Collection that were reviewed by a human post-August 22, 2022, and coded responsive and privileged.

35. Indicate for each Initial Plaintiff Group, listed separately by name, the number of documents from the Supplemental TAR Collection that reviewed by a human post-August 22, 2022, and coded non-responsive.

III. THE STRATIFIED SAMPLING (A.K.A., “BROILERS”) EXERCISE

1. The following eight subsamples shall be randomly drawn from the eight document populations described below. Only one copy of each document should be included in the population; in other words, do not sample duplicates.

A. A random sample of 500 documents from the Original TAR Collection, subject to TAR and coding before August 22, 2022, and coded Responsive by a human reviewer (regardless of whether coded privileged or not). This subsample or stratum shall be referred to as “OBR” (referring to original, before, and responsive).

B. A random sample of 500 documents from the Original TAR Collection, subject to TAR and coding before August 22, 2022, and coded Non-Responsive by a human reviewer. This subsample or stratum shall be referred to as “OBN” (referring to original, before, and non-responsive).

C. A random sample of 500 documents from the Original TAR Collection, subject to TAR and coding after August 22, 2022, and coded Responsive by a human reviewer (regardless of whether coded privileged or not). This subsample or stratum shall be referred to as “OAR” (referring to original, after, and responsive).

D. A random sample of 500 documents from the Original TAR Collection, subject to TAR and coding after August 22, 2022, and coded Non-Responsive by a human reviewer. This subsample or stratum shall be referred to as “OAN” (referring to original, after, and non-responsive).

E. A random sample of 500 documents from the Supplemental TAR Collection, subject to TAR and coding after August 22, 2022, and coded Responsive by a human reviewer (regardless of whether coded privileged or not). This subsample or stratum shall be referred to as “SAR” (referring to supplemental, after, and responsive).

F. A random sample of 500 documents from the Supplemental TAR Collection, subject to TAR and coding after August 22, 2022, and coded Non-Responsive by a human reviewer. This subsample or stratum shall be referred to as “SAN” (referring to supplemental, after, and non-responsive).

G. A random sample of 1,500 documents from the Original TAR Collection, subject to TAR before August 22, 2022, and deemed non-responsive by TAR and therefore never reviewed by a human reviewer. This sample or stratum shall be referred to as “OX” (referring to Original TAR Collection null set only).

H. A random sample of 1,500 documents from the Supplemental TAR Collection, subject to TAR after August 22, 2022, and deemed non-responsive by TAR and therefore never reviewed by a human reviewer. This sample or stratum shall be referred to as “SX” (referring to the Supplemental TAR Collection null set only). SX should not include any documents in OX.

2. These eight random subsamples shall be combined into a single random sample containing 6,000 documents. The 6,000 documents shall be arranged in random order, with no information indicating what subsample or stratum any document belonged to, how it was previously coded, and whether or not it was previously produced or withheld as privileged or non-responsive.

3. The single, combined random sample containing 6,000 documents shall be reviewed and coded for responsiveness and privilege by one or more subject matter experts (“SMEs”), referring to attorneys who are very knowledgeable about the subject matter of the litigation, and who are very familiar with the RFPs and the issues in the case. During the course of the review of this single, combined random sample, the SME(s) shall not be provided with any information concerning the subsample or stratum from which any document was derived, or the prior coding or production status of any document. The intent of this requirement is to ensure that the review of the combined random sample is blind; it does not preclude the Initial Plaintiffs from selecting as SME(s) one or more attorneys who may have had prior involvement in the original review or QC processes.

4. The Initial Plaintiffs have inquired whether they must review the single random sample of 6,000 documents for both responsiveness and privilege simultaneously, or whether they may conduct these reviews seriatim, and refer to the document’s family member(s) in making the privilege determinations in connection with the responsive documents during a second-pass review. While it is preferable that the reviews be conducted simultaneously and without reference to family members, the Initial Plaintiffs may review the 6,000-document sample first for responsiveness, which must be completely blind, and then perform a second-pass review for privilege, reviewing only those documents that the SME(s) coded responsive, referring to their family member(s) if necessary to make the privilege determination. During the second-pass privilege review, the SME(s) shall not be provided with any information concerning the prior coding or production status of any document or any of its family members. If the Initial Plaintiffs do their SME review in this two-pass fashion, they must indicate in their table whether each responsive document has previously been produced (including its Bates Number) and, if it has been withheld, where it appears on the Initial Plaintiffs’ privilege log (or where it is to be found on any privilege log that the Initial Plaintiffs will subsequently produce). The Initial Plaintiffs are not relieved of their obligation to correct any privilege determinations they discover that were made in error and that resulted in the withholding of a document that is not, in fact, privileged.

5. Once the coding of the single, combined random sample has been completed, the Initial Plaintiffs shall prepare a table (in spreadsheet form) listing each of the 6,000 documents in the sample. For each document, the table shall include the following information:

A. The Bates Number of the document (for documents that were previously produced), or a Unique Control/Identification Number (for documents that were not previously produced).

B. The name of the Initial Plaintiff Group from which the document came.

C. The subsample or stratum from which the document was derived (i.e., OBR, OBN, OAR, OAN, SAR, SAN, OX, or SX).

D. The SME’s responsiveness coding for the document (i.e., responsive or non-responsive)

E. The SME’s privilege coding for the document (i.e., privileged or not privileged). If the document is coded as non-responsive, a privilege determination need not be made. All documents in the single, combined random sample that are coded as privileged shall be included on the Initial Plaintiffs’ privilege log. If a document has previously been included on a privilege log, where it may be found should be indicated.


6. The following items shall be provided to the Defendants and the Special Master:

A. A list showing the number of documents contained in each stratum (i.e., OBR, OBN, OAR, OAN, SAR, SAN, OX, and SX), and how many of them were contributed by each of the 13 Initial Plaintiff Groups, listed separately by name.

B. The table described in ¶5 above, in spreadsheet form.

C. A copy of each responsive, non-privileged document in the single, combined random sample that was not previously produced to Defendants.

7. NOTE: If any of the eight subsamples turns out to have an insufficient number of examples from any Initial Plaintiff Group (i.e., if any of the 13 Initial Plaintiff Groups are underrepresented in one or more subsamples or strata), additional sampling from that Initial Plaintiff Group (or other actions or analyses) may be necessary at a later date. The Parties may suggest, but the Special Master will make any and all such determinations based on a review of the results of the Stratified Sampling (a.k.a., “Broilers”) Exercise. It would have been unduly complicated to require a specific number of documents (e.g., 10-12) from each Initial Plaintiff Group to be included in each subsample or stratum, and it would be too burdensome and disproportionate to repeat the Stratified Sampling (a.k.a., “Broilers”) Exercise for each of the 13 Initial Plaintiff Groups.

8. Once the Special Master and Defendants have received and have had an opportunity to review the items described in ¶5 above, as well as the results of the Prevalence Exercise set forth in §II above, and the Special Master has calculated and provided the Parties with the information set forth in the Appendix, the Parties shall meet with the Special Master to discuss the results of the Preliminary Diagnostic Protocol and whether the results suggest that the Initial Plaintiffs’ Original and Supplemental Productions are substantially complete (modulo any additional discovery that may be granted with respect to the additional custodians Defendants have sought in their Motion to Compel), or otherwise. From the information collected through this diagnostic process, the Parties and the Special Master will learn (i) the number of responsive documents produced and the number of responsive documents missed in both the Original and Supplemental TAR Productions, as well as (ii) estimates of summary statistics (i.e., Recall and Precision) for the Original and Supplemental TAR Productions, both in total and for each Initial Plaintiff Group. The information derived from the diagnostic process will help to show whether there has been “concept drift” in the Initial Plaintiffs’ conception of relevance, the quality of the TAR process as compared to the human review component, the consistency between the responsiveness determinations made by the original reviewers versus the Preliminary Diagnostic Protocol SMEs, and other useful information. Any substantial inconsistencies or irregularities that may be found may require further investigation and remediation, but the overall results may also demonstrate that the Original and Supplemental Productions are substantially complete and that no further investigation or remediation is warranted. No absolute bright-line test that can be prescribed in advance that will dictate success or failure; rather, reasonableness and proportionality dictate that the results of the diagnostic process must be considered in toto, and in terms of both the quantity and quality (i.e., materiality) of the documents produced or missed. Nonetheless, the diagnostic process set forth in this Order should provide the Parties, the Special Master, and the Court with meaningful information that will permit an overall assessment of the quality of the Initial Plaintiffs’ productions to date, without invading work product, and, if there are issues with either the Original or Supplemental Productions (or any particular Initial Plaintiff Group), should help to diagnose what and where those issues are, allowing for targeted remediation efforts. If the Parties are unable to agree on whether the Initial Plaintiffs’ productions to date are substantially complete, or whether additional analysis or remediation efforts are warranted, the Special Master shall render a decision, subject to the Parties’ rights to petition the Court for review of or relief from any decision rendered by the Special Master.

IV. THE DECLARATION

1. Upon completion of the Prevalence Exercise set forth above in §II and the Stratified Sampling (a.k.a., “Broilers” Exercise), described above in §III, the Initial Plaintiffs shall prepare and submit to Defendants and the Special Master, one or more declaration(s) or affidavit(s) (as appropriate), attesting, under penalty of perjury, that (i) because of the business structures of the 28 Initial Plaintiffs, they have no reason or basis to believe that any of the Initial Plaintiffs would have any custodians, responses, and/or productions that differ from the Initial Plaintiff Group in which such Initial Plaintiff was placed, (ii) the Stratified Sampling (a.k.a., “Broilers”) Exercise was performed blindly, meaning that the SMEs who coded the documents, pursuant to the Special Master’s instructions, had no knowledge of how the documents in the sample were previously coded, what subsample, stratum, or document population they were drawn from, whether they were previously produced or withheld and on what basis (modulo the single exception provided for in §III, ¶4 for privileged documents, if necessary), and (iii) the information provided in response to the Preliminary Diagnostic Protocol (including the Prevalence and the Stratified Sampling (a.k.a. “Broilers”) Exercises is accurate and complete to the best of the declarant (or affiant’s) knowledge, information, ability, and belief.

SO ORDERED.


APPENDIX
Validation Worksheet

The purpose of validation sampling is to obtain an estimate of the quantity and quality (i.e., materiality) of responsive documents identified and missed by a particular review effort, and to suggest the source of any inadequacy that may be indicated by such estimate.

The review effort that is the subject of this validation effort was conducted in at least two phases and was applied to a combined collection of disparate documents from 13 Initial Plaintiff Groups. To account for the phases, it is necessary to divide the overall collection into eight strata, and to have subject matter experts (“SMEs”) conduct a blind review of the eight samples combined, as detailed in §III, ¶¶1-3, above. To account for the 13 Initial Plaintiff Groups, it is further necessary to capture, for each stratum and sample, which documents are attributable to each group, as detailed in §III, ¶5 above.

In the science of information retrieval, it is well understood that the notion of “relevance” or “responsiveness”[5] is subjective, and that equally competent, well-informed reviewers are likely to agree perhaps 70% of the time; that is, if one reviewer codes a document responsive, a second reviewer is likely to code the same document non-responsive about 30% of the time. It is often pointless to argue about which reviewer is “right” and which one is “wrong”—there will always be a substantial number of documents about which reasonable minds can differ. At the same time, such documents are frequently at the margins—rather than critical or “hot” documents—offering relatively little useful information beyond the documents about which the reviewers generally can agree.

As a consequence, it is generally understood that if a second independent review(er) agrees with about 70% of the “responsive” coding decisions of a first review(er), and the first review(er) agrees with about 70% of the “responsive” coding decisions of the second review(er), both review(er)s are of reasonably high quality. In information retrieval, the fraction of the time that the second review(er) agrees with the first is called “precision,” and the fraction of the time that the first review(er) agrees with the second is called “recall.”[6] It is important to note that this observation is valid only if the review(er)s are independent. The purpose of stratified sampling and blind review by one or more SMEs is to provide such an independent assessment.

While precision and recall (when estimated through independent review(er)s) may be useful summary measures, they tend to obscure the key issue, which is: According to the independent review(er), how many responsive documents were identified, compared to how many responsive documents were missed? To this end, the Special Master shall compute estimates of these quantities for each stratum, and for each Initial Plaintiff Group within each stratum. These statistics will be combined in various ways; for example, to compute Recall and Precision Estimates for the various phases of the review process, and for the various Initial Plaintiff Groups.

I. Method for Estimating the Number of Responsive Documents in Each Stratum According to the SME (“SME-Responsive Documents”):

1. The Special Master will compute an estimate of the total number of SME-Responsive documents in each stratum (i.e., OBR, OBN, OAR, OAN, SAR, SAN, OX, and SX).

  • The total number of SME-Responsive documents in a particular stratum (one of r-OBR, r-OBN, r-OAR, r-OAN, r-SAR, r-SAN, r-OX, or r-SX, respectively) ≈ the number of documents in the particular stratum × the number of documents coded responsive by the SME in that stratum subsample ÷ the number of documents in that stratum subsample.

2. The Special Master will compute an estimate of the number of SME-Responsive documents in each stratum for each Initial Plaintiff Group.

  • The number of SME-Responsive documents in a particular stratum from a particular Initial Plaintiff Group (Name of Initial Plaintiff Group 1 r-OBR, Name of Initial Plaintiff Group 2 r-OBR, etc.) ≈ the number of documents in the particular stratum from the particular group × the number of documents from the particular group coded responsive by the SME in that stratum subsample ÷ the number of documents from the particular group in that stratum subsample. NOTE: If the number of documents from the particular Initial Plaintiff Group in the sample is zero, no estimate is possible.

Each specific estimate above will be computed and reported for the total collection, as well as for each of the 28 Initial Plaintiff Groups. The results will be provided in a table with eight columns (one for each particular stratum) and 14 rows (one for each particular Initial Plaintiff Group, and one for the Initial Plaintiffs overall).

II. Method for Estimating Summary Measures: Recall and Precision

1. From the list and table the Initial Plaintiffs provide from the Stratified Sampling (a.k.a., “Broilers”) Exercise, estimates of each of the following will be used by the Special Master to calculate summary measures for Recall and Precision:

A. r-OBR ≈ the estimated total number of SME-Responsive documents collected, added, and coded responsive pre-August 22, 2022.

B. r-OAR ≈ the estimated total number of SME-Responsive documents collected and added pre-August 22, 2022 and coded responsive post-August 22, 2022.

C. r-OAA ≈ the estimated total number of SME-Responsive documents collected, added, and coded responsive post-August 22, 2022.

D. r-OBN ≈ the estimated total number of SME-Responsive documents collected, added, and coded non-responsive pre-August 22, 2022.

E. r-OAN ≈ the estimated total number of SME-Responsive documents collected and added pre-August 22, and coded non-responsive post-August 22, 2022.

F. r-SAN ≈ the estimated total number of SME-Responsive documents collected or added and coded non-responsive post-August 22, 2022.

G. r-OX ≈ the estimated total number of SME-Responsive documents collected and added pre-August 22, 2022 but never reviewed by a human (i.e., the number of documents excluded by TAR).

H. r-SX ≈ the estimated total number of SME-Responsive documents collected or added post-August 22, 2022 but never reviewed by a human (i.e., the number of documents excluded by TAR).

2. The Special Master will use the numbers above (and others the Initial Plaintiffs will have reported from the Stratified Sampling (a.k.a., “Broilers”) Exercise) to calculate the following Recall Estimates:

  • August 22, 2022 Recall Estimate for the Original TAR Collection ≈ r-OBR ÷ (r-OBR + r-OBN + r-OAR + r-OAN + r-OX)
  • Current Recall Estimate for the Original TAR Collection ≈ (r-OBR + r-OAR) ÷ (r-OBR + r-OBN + r-OAR + r-OAN + r-OX)
  • Current Recall Estimate for the Supplemental TAR Collection ≈ r-SAR ÷ (r-SAR + r-SAN + r-SX)
  • Current Recall Estimate for the combined Original and Supplemental TAR Collections ≈ (r-OBR + r-OAR + r-SAR) ÷ (r-OBR + r-OBN + r-OAR + r-OAN + r-SAR + r-SAN + r-OX + r-SX)

3. The Special Master will use the numbers above (and others the Initial Plaintiffs will have reported from the Stratified Sampling (a.k.a., “Broilers”) Exercise) to calculate the following Precision Estimates:

  • August 22 Precision Estimate for the Original TAR Collection ≈ r-OBR ÷ number of documents in OBR stratum
  • Current Precision Estimate for the Original TAR Collection ≈ (r-OBR + r-OAR) ÷ number of documents in OBR plus OAR strata
  • Current Precision Estimate for the Supplemental TAR Collection ≈ r-SAR ÷ number of documents in SAR stratum
  • Current Precision for the combined Original and Supplemental TAR Collections ≈ (r-OBR + r-OAR + r-SAR) ÷ number of documents in OBR plus OAR plus SAR strata

In addition to the four overall Recall and the four overall Precision Estimates described above, the same eight summary statistics will be computed for each of the 13 Initial Plaintiff Groups to the degree possible (see NOTE in §3, ¶6 above). The results will be provided as a table with eight columns (one for each summary statistic above) and 14 rows, one for each particular Initial Plaintiff Group (to the degree possible), and one for the Initial Plaintiffs overall.

III. Comparative Statistics

The following comparison statistics will provide some additional insight into the agreement between the prevalence-sample reviewers, the validation SMEs, and, indirectly, the review for production. Substantially divergent estimates would suggest a divergence of the relevance criteria over time (i.e., “concept drift”) and/or reviewer error. It will not be necessary to compute these statistics separately for the 13 Initial Plaintiff Groups.

  • Estimated number of responsive documents in the Original TAR Collection (according to the Initial Plaintiffs’ initial prevalence estimate) ≈ number of documents in Original Collection × number of responsive documents in prevalence sample ÷ number of documents in prevalence sample.
  • Estimated number of responsive documents in the Original TAR Collection (according to the SME) ≈ r-OBR + r-OBN + r-OAR + r-OAN + r-OX
  • Estimated number of responsive documents in the Supplemental TAR Collection (according to the Initial Plaintiffs’ supplemental prevalence estimate, if conducted) ≈ number of documents in Supplemental TAR Collection × number of responsive documents in supplemental prevalence sample ÷ number of documents in supplemental prevalence sample.
  • Estimated Number of responsive documents in the Supplemental TAR Collection (according to the SME) ≈ r-SAR + r-SAN + r-SX

Footnotes

The Initial Plaintiffs consist of 28 entities from whom collections and productions were sought. They are BCBSM, Inc. (d/b/a Blue Cross and Blue Shield of Minnesota); HMO Minnesota (d/b/a Blue Plus); Health Options, Inc. (d/b/a Florida Blue HMO); Blue Cross and Blue Shield of North Carolina; Blue Cross Blue Shield of North Dakota; Blue Cross and Blue Shield of Florida, Inc. (d/b/a Florida Blue); Blue Cross and Blue Shield of Alabama; Blue Cross and Blue Shield of Kansas, Inc.; Blue Cross and Blue Shield of Massachusetts, Inc.; Blue Cross and Blue Shield of Massachusetts HMO Blue, Inc.; Wellmark, Inc. (d/b/a Wellmark Blue Cross and Blue Shield and d/b/a Wellmark Blue Cross and Blue Shield of Iowa, Inc.); Wellmark of South Dakota, Inc. (d/b/a Wellmark Blue Cross and Blue Shield of South Dakota); Wellmark Health Plan of Iowa, Inc.; Wellmark Synergy Health, Inc.; Wellmark Value Health Plan, Inc.; Blue Cross and Blue Shield of Arizona, Inc. (d/b/a Blue Cross Blue Shield of Arizona and d/b/a AZBLUE); Asuris Northwest Health; Blue Cross and Blue Shield of Kansas City, Inc.; Cambia Health Solutions, Inc.; Regence BlueShield of Idaho, Inc.; Regence BlueCross BlueShield of Oregon; Regence BlueCross BlueShield of Utah; Regence BlueShield; HealthNow New York, Inc.; Highmark Western New York, Inc. (f/k/a Blue Cross of Western New York); Northeastern New York (f/k/a BlueShield of Case: 1:20-cv-01853 Document #: 465 Filed: 07/21/23 Page 1 of 17 PageID #:9659 Northeastern New York); Horizon Healthcare Services, Inc. (d/b/a Horizon Blue Cross Blue Shield of New Jersey); and Horizon Healthcare of New Jersey, Inc. (d/b/a Horizon NJ Health).

These filings included Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 374 (Unsealed)]; Memorandum in Support of Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 376 (Unsealed) and Dkt. 378 (Sealed)]; Declaration of Charles D. Zagnoli in Support of Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 377 (Unsealed) and Dkt. 379 (Sealed)]; Plaintiffs’ Opposition to Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 400 (Unsealed) and Dkt. 404 (Sealed)]; Declaration of Cristhian Cabezas in Support Plaintiffs’ Opposition to Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 401 (Unsealed) and Dkt. 405 (Sealed)]; Declaration of Kelly H. Hibbert in Support of Plaintiffs’ Opposition to Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 402 (Unsealed)]; Exhibit 9 [Dkt. 406 (Sealed)]; Reply in Support of Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 413 (Sealed) and Dkt. 415 (Unsealed)]; Declaration of Shannon Capone Kirk [Dkt. 414 (Sealed)]; and Declaration of Charles D. Zagnoli in Support of Reply in Support of Defendants’ Motion to Compel Plaintiffs’ Document Discovery and Related Metrics [Dkt. 416 (Unsealed)].

For purposes of collection, review, and production in this matter, the 28 Initial Plaintiffs divided themselves into 13 groups. The Initial Plaintiffs represent that because of their business structures, the members of each group have identical custodians, responses, and productions. In this Preliminary Diagnostic Protocol, we will refer to these 13 groups as the “Initial Plaintiff Groups.” The 13 Initial Plaintiff Groups are: (1) BCBSM, Inc. (d/b/a Blue Cross and Blue Shield of Minnesota) and HMO Minnesota (d/b/a Blue Plus) (the “BCBS Minnesota Plaintiffs”); (2) Blue Cross and Blue Shield of Florida, Inc. (d/b/a Florida Blue) and Health Options, Inc. (d/b/a Florida Blue HMO) (the “Florida Blue Plaintiffs”); (3) Blue Cross and Blue Shield of North Carolina; (4) Blue Cross Blue Shield of North Dakota; (5) Blue Cross and Blue Shield of Alabama; (6) Blue Cross and Blue Shield of Kansas, Inc.; (7) Blue Cross and Blue Shield of Massachusetts, Inc. and Blue Cross and Blue Shield of Massachusetts HMO Blue, Inc. (the “BCBS Massachusetts Plaintiffs”); (8) Wellmark, Inc. (d/b/a Wellmark Blue Cross and Blue Shield and d/b/a Wellmark Blue Cross and Blue Shield of Iowa, Inc.), Wellmark of South Dakota, Inc. (d/b/a Wellmark Blue Cross and Blue Shield of South Dakota), Wellmark Health Plan of Iowa, Inc., Wellmark Synergy Health, Inc., and Wellmark Value Health Plan, Inc. (the “Wellmark Plaintiffs”); (9) Blue Cross and Blue Shield of Arizona, Inc. (d/b/a Blue Cross Blue Shield of Arizona and d/b/a AZBLUE); (10) Asuris Northwest Health; Cambia Health Solutions, Inc.; Regence BlueShield of Idaho, Inc.; Regence BlueCross BlueShield of Oregon; Regence BlueCross BlueShield of Utah; Regence BlueShield (the “Cambia Plaintiffs”); (11) HealthNow New York, Inc.; Highmark Western New York, Inc. (f/k/a Blue Cross of Western New York); Northeastern New York (f/k/a BlueShield of Northeastern New York) (the “HealthNow Plaintiffs”); (12) Horizon Healthcare Services, Inc. (d/b/a Horizon Blue Cross Blue Shield of New Jersey); and Horizon Healthcare of New Jersey, Inc. (d/b/a Horizon NJ Health) (the “Horizon Plaintiffs”); and (13) Blue Cross and Blue Shield of Kansas City, Inc.

The “CVS Documents” refer to the production of the documents of the overlapping Initial Plaintiffs in this action that were produced in BCBS of Alabama, et al. v. CVS Health Corp., et al., 1:20-cv-236 (D.R.I.).
[5]

In information retrieval, the term-of-art “relevance” is used to describe the same property that is known in eDiscovery as “responsiveness.” This Order follows eDiscovery practice and uses the term “responsiveness.”
To avoid any confusion, it is noted that these definitions of “recall” and “precision” used here are not different from the definitions of these terms typically understood in the eDiscovery industry, i.e., that “recall” is the proportion of responsive documents that have been identified, and that “precision” is the proportion of identified documents that are truly responsive. This is simply a different way of describing the same thing, which does not assume that the SME is infallible.