In re Diisocyanates Antitrust Litig.
In re Diisocyanates Antitrust Litig.
2022 WL 17668470 (W.D. Pa. 2022)
October 19, 2022
Francis IV, James C. (Ret.), Special Master
Summary
The plaintiffs challenged the defendants' search terms and TAR process, arguing that the search terms were too limiting and that thousands of responsive documents were missed. The defendants maintained that their search terms and TAR process were adequate. The court found that the defendants' search terms achieved an acceptable recall rate and that the documents from the null set identified by the defendants were not individually consequential. The court recommended that the plaintiffs' motion to compel the defendants to use the plaintiffs' search terms and further TAR review be denied.
Additional Decisions
IN RE: DIISOCYANATES ANTITRUST LITIGATION This Document Relates to: All Cases
Master Docket Misc. No. 18-1001 | MDL No. 2862
United States District Court, W.D. Pennsylvania
Signed October 19, 2022
Francis IV, James C. (Ret.), Special Master
Report and Recommendation
*1 This multidistrict litigation involves allegations that the defendants conspired to reduce supply and increase prices for methylene diphenyl diisocyanate (“MDI”) and toluene diisocyanate (“TDI”), chemicals used in the manufacture of polyurethane foam and thermoplastic polyurethanes. Using search terms and technology assisted review (“TAR”), certain defendants – BASF Corporation (“BASF”), Covestro LLC (“Covestro”), the Dow Chemical Company (“Dow”), Huntsman Corporation (“Huntsman”), and Wanhua Chemical (America) Co., Ltd. (“WCA”) (collectively, the “defendants”) – have produced documents responsive to the plaintiffs’ requests. The plaintiffs, however, contend that the productions are inadequate and have moved under Rule 37 of the Federal Rules of Civil Procedure to compel the defendants to utilize additional search terms and to continue the TAR process. The Court referred this motion to me for a report and recommendation. (ECF 713). For the reasons that follow, I recommend that the motion be granted in part and denied in part.
Background
There is a lengthy history leading up to the current motion. After substantial negotiations over search terms and TAR, the plaintiffs moved in March 2021 for an order compelling the defendants to use certain search terms and TAR methodologies in their productions. (ECF 455, 460). The defendants cross-moved for a protective order permitting them to proceed with the search terms and TAR protocols they had selected. (ECF 470, 471). Thereafter, the Court appointed me as E-Discovery Special Master with the task of providing a recommended disposition of the motions. (ECF 504).
While the motions were pending, I attempted to assist the parties in reaching agreement with respect to the outstanding disputes. These efforts were unsuccessful, and in a Report and Recommendation dated August 23, 2021, In re Diisocyanates Antitrust Litigation, 2021 WL 4295729 (W.D. Pa. Aug. 23, 2021) (“First R&R”) (ECF 529), I recommended that the defendants’ motion for a protective order be denied. (First R&R at *12).[1] I found that their proposed TAR methodology was unreasonable, first because the validation procedure tested recall only for the search conducted by TAR, ignoring the documents eliminated by search terms, and, second because it failed to take advantage of the capability of the defendants’ continuous active learning (“CAL”) tools to analyze the marginal value of conducting additional search iterations beyond a putative stopping point. (First R&R at *8-10). Next, I declined to endorse the defendants’ search terms, concluding that their exclusive focus on hit rates was inappropriate and that they had failed to conduct systematic testing of their proposed terms. (First R&R at *10-11). Furthermore, I found that the defendants had not sufficiently documented their assertion that use of the plaintiffs’ proposed search terms would impose a disproportionate burden. (First R&R at *11-12).
*2 At the same time, I recommended that the plaintiffs’ motion to compel be denied. (First R&R at *13). I found that their proposed validation criteria exceeded what the law requires by, among other things, incorporating an analysis of human review error. Since other reasonable methodologies remained available to the defendants, I declined to recommend that use of the plaintiffs’ protocol be mandated. (First R&R at *12-13). Similarly, I determined that the defendants should be permitted to identify reasonable search terms and not be required to adopt the plaintiffs’. (First R&R at *13).
The defendants did not object to the Report and Recommendation and instead announced their intention to modify their TAR protocols to address the flaws I had identified in the First R&R. (Defendants’ Response to the Special Master's Report and Recommendation (“Def. Resp. to First R&R”) (ECF 532) at 7-11). The defendants represented that in the event the parties did not agree on search terms, they would apply their chosen terms but would perform a validation analysis that covered the search term phase as well as the TAR phase of the process. (Def. Resp. to First R&R at 9-11). The defendants further committed to a process by which they would share with the plaintiffs the number and content of responsive documents identified in the last two batches of documents processed by TAR at the point where the defendants proposed to conclude their search. The plaintiffs would then be able to seek Court review if they disagreed with the defendants’ assessment that the search was reasonably complete. (Def. Resp. to First R&R at 11-12).
The plaintiffs objected to the Report and Recommendation and asked the Court to adopt one of three alternative solutions: (1) grant the plaintiffs’ motion, requiring the defendants to implement the plaintiffs’ TAR procedures and run the plaintiffs’ proposed search terms (or forgo search terms altogether and utilize TAR exclusively); (2) adjudicate the search terms then in dispute and remand the parties back to me to fashion a TAR methodology; or (3) simply order the defendants to produce all non-privileged documents in their collections that hit on the plaintiffs’ search terms. (Memorandum of Law in Support of Plaintiffs’ Objections to the Special Master's Report and Recommendation of August 23, 2021) (“Pl. Obj. to First R&R”) (ECF 535) at 2-3). The plaintiffs argued that “[t]he Court can, and should, overrule the Special Master's choice to defer to Defendants’ ability to choose new, reasonable search methodologies, and instead enter an Order prescribing Plaintiffs’ search terms and TAR protocols now.” (Pl. Obj. to First R&R at 6).
The Court overruled the plaintiffs’ objections, holding that “Defendants are not compelled to adopt the Plaintiffs’ search terms or TAR methodologies.” (Opinion and Order of Court, In re Diisocyanates Antitrust Litigation, 2021 WL 4295719, at *1 (W.D. Pa. Sept. 21, 2021)) (“First Opinion”) (ECF 549). The Court further ordered:
Defendants are to proceed, forthwith, as they have outlined in their submissions. Importantly, once Defendants reach a point where they believe their search is complete, they shall provide to Plaintiffs the following: (a) the Bates number of all relevant documents obtained from the last two batches searched, identify which of the batches these documents were found in, and identify the number of relevant but privileged documents withheld with respect to each of the two batches; and (b) the recall rate and all calculations used to derive that rate. If Plaintiffs agree, then Defendants may conclude their search. If Plaintiffs do not agree, the parties shall, after meeting and conferring, present their dispute to the Court for resolution by the Special Master.
*3 (First Opinion at *2).
After further negotiations, the parties could still not agree on search terms, and the defendants again moved for a protective order permitting them to use their selected terms (ECF 582, 583), while the plaintiffs moved to compel the defendants to use the plaintiffs’ terms (ECF 589, 592). Again, the Court referred this dispute to me. (ECF 596). In a Report and Recommendation dated January 7, 2022, In re Diisocyanates Antitrust Litigation, 2022 WL 173678 (W.D. Pa. Jan. 7, 2022) (“Second R&R”) (ECF 611), I found that when the Court directed the defendants to “proceed, forthwith, as they have outlined in their submissions” (First Opinion at *2), it foreclosed further adjudication of the search term dispute at that juncture. (Second R&R at *3). At the same time, I pointed out that this “does not immunize the defendants’ production from challenge at the end of the day.” (Second R&R at *4). Rather, the plaintiffs would have an opportunity to challenge both the search terms selected by the defendants and the point at which the defendants chose to halt the TAR process based on the validation information that the defendants had committed to providing. (Second R&R at *4). Accordingly, I recommended denying the plaintiffs’ motion to compel on the merits and denying the defendants’ motion for a protective order as moot. (Second R&R at *5). On January 19, 2022, the Court adopted this Report and Recommendation in full. (Opinion and Order of Court, In re Diisocyanates Antitrust Litigation, 2022 WL 170036 (W.D. Pa. Jan. 19, 2022)) (“Second Opinion”) (ECF 619). In that ruling, the Court observed that “at the appropriate time, Plaintiffs will be permitted to challenge the search term inadequacies they believe rendered the search unreasonable.” (Second Opinion at *2).
Each of the defendants substantially completed their productions and, between March and June 2022, provided plaintiffs with information regarding the validation of their TAR methodologies and search terms (collectively, the “Validation Letters”). (Declaration of Sarah R. LaFreniere dated Aug. 9, 2022 (“LaFreniere Decl.”), ¶¶ 22-23 & Exhs. 3-8). The Validation Letters provided: (1) the Bates numbers of the responsive documents identified in the penultimate and final batches of TAR documents reviewed; (2) the results of an elusion test, including the number of responsive documents identified and the Bates numbers of those documents; and (3) the estimated recall for each defendant's search. (LaFreniere Decl. ¶ 26). The numbers of responsive documents identified in the penultimate and final iterations of each defendant's TAR review are as follows:

(LaFreniere Decl. ¶ 27).
*4 The defendants also provided the plaintiffs with the results of an elusion test that each performed. In executing this test, they each sampled 2,400 documents drawn from two populations. The first population consisted of documents excluded from the TAR review set because they did not hit on the defendants’ search terms. This was deemed the “null set.” The second comprised documents that were included in the TAR review set but were not subjected to human review because the TAR algorithms did not give them a high enough classification score. This group was designated the “unreviewed TAR population.” The defendants identified by Bates number the documents from the null set and the unreviewed TAR population they had identified as responsive when conducting the elusion test. (LaFreniere Decl. ¶ 28). The results of the elusion tests are as follows:

(LaFreniere Decl. ¶ 28).
Finally, the defendants provided the plaintiffs with the estimated recall rate for the entire review. They did this by comparing the predicted number of responsive documents remaining in the unreviewed document population to the number of responsive documents identified in the review. The information provided by WCA is illustrative of the calculations:

(Letter of Alden L. Atkins dated April 1, 2022 (“Atkins 4/1/22 Letter”), attached as Exh. 7 to LaFreniere Decl., at 2).
The summary of the defendants’ recall estimates and the predicted number of responsive but unreviewed documents is as follows:

(LaFreniere Decl. ¶ 29).
In the Validation Letters, each defendant stated that it had concluded that, based on its review of the last two batches and the recall metrics, it was appropriate to end the TAR review. (Letter of Daniel T. Fenske dated April 4, 2022, attached as Exh. 3 to LaFreniere Decl., at 2 (BASF); Letter of Avia Gridi dated May 11, 2022, attached as Exh. 4 to LaFreniere Decl., at 2 (Covestro); Letter of Vanessa Barsanti dated May 13, 2022, attached as Exh. 5 to LaFreniere Decl., at 2 (Dow); Letter of Zachary K. Warren dated April 22, 2022, attached as Exh. 6 to LaFreniere Decl., at 2 (Huntsman); Atkins 4/1/22 Letter at 2-3(WCA)). In each instance, the plaintiffs sought additional information, and some defendants provided further explanation and data. (e.g., Letter of Avia Gridi dated June 21, 2022, attached as Exh. 8 to LaFreniere Decl.).
The plaintiffs dispute that production is reasonably complete, and they presented their objections to each defendant. (LaFreniere Decl. ¶ 33 & Exhs. 9-14). When they failed to receive what they considered satisfactory responses, the plaintiffs filed the instant motion. They challenge two aspects of the defendants’ search. First, they contend that the search terms utilized by the defendants were too limiting, as a result of which thousands of responsive documents were never put through the TAR review process. (Memorandum of Law in Support of Plaintiffs’ Motion to Compel Search Terms and Further TAR Review (“Pl. Memo.”) (ECF 706) at 7-10). In support of this argument, they provide a chart containing search terms that plaintiffs unsuccessfully requested that the defendants run. (Pl. Memo., Exh. A; LaFreniere Decl., ¶ 39). This chart divides these disputed search terms into three categories. Group 1 consists of search terms that the defendants declined to run altogether. (Pl. Memo., Exh. A at 1-3; LaFreniere Decl., ¶ 40). Group 2 comprises search terms where the defendants ran a narrower version of the string that the plaintiffs requested. (Pl. Memo., Exh. A at 5-14; LaFreniere Decl., ¶ 41). And Group 3 includes terms designed to identify meetings and communications between defendants. The plaintiffs offer to forgo the search terms in Group 3 if the defendants will produce their full calendars.[2] Since Dow has already done so, this category is in dispute only with respect to BASF, Covestro, Huntsman, and WCA. (Pl. Memo., Exh. A at 15-17; LaFreniere Decl., ¶ 42). The chart also includes the results of sampling conducted by BASF, Dow, and WCA, showing the hit rate for the plaintiffs’ requested search terms across these defendants’ collections. Huntsman and Covestro did not share with the plaintiffs the results of any such sampling they conducted. (Pl. Memo., Exh. A; LaFreniere Decl., ¶¶ 43-44). The plaintiffs provide examples of documents that were not identified by the defendants’ search terms that the plaintiffs maintain are central to the case. (Pl. Memo. at 8-10).
*5 The plaintiffs further challenge the decision of defendants Covestro, Huntsman, and WCA to halt their TAR review at the point that they did. (Pl. Memo. at 11-12). The plaintiffs contend that the percentage of responsive documents identified in the last two TAR batches reviewed by these defendants – 19% for Covestro, 18% for Huntsman, and 15% for WCA – was too high to justify stopping the review. (Pl. Memo. at 11). Beyond these statistics, the plaintiffs argue that documents located both in the last two TAR batches and in the unreviewed TAR populations of these defendants included novel and important ones, including documents reflecting communications between competitors, customer complaints, and documents about pricing, swaps, and supply. (Pl. Memo. at 11-12). According to the plaintiffs, “[i]t is striking ... that by their own count Defendants admit they are withholding 180,000 responsive documents from production to Plaintiffs.” (Pl. Memo. at 1 (emphasis and footnote omitted)).
Finally, the plaintiffs present a unique argument with respect to BASF, asserting that “the quantity and quality of BASF's validation documents do not suggest that the review terminated too early, but rather that certain categories of documents were missed.” (Pl. Memo. at 12). The plaintiffs contend that BASF's validation documents included communications with other defendants and emails about pricing and reduced output for MDI and TDI, and BASF should therefore be required to conduct targeted searches for documents in these categories. (Pl. Memo. at 12-13).
The defendants contest each of the plaintiffs’ arguments. First, they maintain that the Court has already decided that the defendants may use the search terms that they developed. (Defendants’ Joint Memorandum of Law in Opposition to Plaintiffs’ Motion to Compel Search Terms and Further TAR Review (“Def. Memo.”) (ECF 732) at 6-8). Next, they argue that they have demonstrated the reasonableness of their searches in terms of both quantity and quality. (Def. Memo. at 8). With respect to quantitative analysis, the defendants point out that they had stated that they would “[r]eview documents until each Defendant reasonably believes it has achieved a targeted recall rate of approximately 70% based on an ‘end-to-end’ Recall rate (including both the TAR Review Set plus the Null Set.” (Def. Memo. at 8). They say they have achieved this, since the defendants’ recall rates range from 73.81% to 88.97%. (Def. Memo. at 9-10). The defendants further argue that the large number of presumed responsive documents that have not been identified is not meaningful, since this merely reflects the enormous volume of documents collected in the first instance. What is important, according to the defendants, is not the absolute number of documents left behind, but the proportion of responsive documents identified and produced. (Def. Memo. at 10-11). The defendants also contend that the decision of each defendant to halt its TAR review is supported by the statistics, since the percentage of responsive documents located in the last two batches reviewed ranged from 1.3% to 19.58%. According to the defendants, “[t]his means that, on average, approximately 80%-90% of the documents being reviewed at this point in Defendants’ reviews were non-responsive.” (Def. Memo. at 12) (emphasis omitted).
Previously, the defendants had also performed quantitative analyses of the plaintiffs’ proposed search terms. Each defendant selected a random sample of 2,400 documents from the documents excluded by that defendant's search terms. It then applied the plaintiffs’ requested search terms to that sample. “[N]o Defendant found more than 5% of the documents in the ‘Null Set’ of documents excluded by search terms to be responsive.” (Def. Memo. at 13 (emphasis omitted)).
As to the quality of their productions, the defendants contend that the responsive documents found either in the last two TAR batches or in random samples from the null set are not more than marginally relevant. (Def. Memo. at 16). Many of them, according to the defendants, are identical to or substantially duplicative of other documents produced to the plaintiffs, including documents that had been collected for disclosure to the U.S. Department of Justice and that were subsequently produced to the plaintiffs. (Def. Memo. at 16-18). Other documents are not strictly duplicative but are of little import to the outcome of the case because they are cumulative of information captured in other documents already produced. (Def. Memo. at 18-19).
*6 The defendants also respond to the assertion that BASF in particular omitted entire categories of responsive documents. They take each document that plaintiffs cite and attempt to show that the document is in fact consistent with other documents that BASF previously produced. (Def. Memo. at 19-20).
Finally, the defendants contend that requiring them to utilize the disputed search terms or continue their TAR review would impose disproportionate cost and burden. The defendants note that they disbanded their review teams when they completed their productions and would now need to re-staff and restart their reviews. (Def. Memo. at 27).
Additional facts will be discussed in connection with the analysis below.
Discussion
A. Legal Standards
It is common ground among the parties that a producing party must take reasonable steps to identify and produce relevant documents and that perfection is not required. (Pl. Memo. at 16; Def. Memo. at 6). See Lawson v. Spirit AeroSystems, Inc., No. 18-1100, 2020 WL 1813395, at *7 (D. Kan. April 9, 2020); Winfield v. City of New York, No. 15-CV-05236, 2017 WL 5664852, at *9 (S.D.N.Y. Nov. 27, 2017); Hyles v. New York City, No. 10 Civ. 3119, 2016 WL 4077114, at *3 (S.D.N.Y. Aug. 1, 2016); Enslin v. Coca-Cola Co., No. 2:14-cv-06476, 2016 WL 7042206, at *3 (E.D. Pa. June 8, 2016). But “reasonable” is an elastic term, and the parties disagree about its application to the defendants’ search and production in this case.
Whether a search may be deemed reasonable depends in part on the substantive law governing the case, because that law informs the types of documents that are relevant and important, which, in turn, dictates the search methodologies that may be appropriate. In an antitrust conspiracy case like this, relevant evidence falls into two broad categories. First, “a plaintiff may, of course, assert direct evidence that the defendants entered into an agreement in violation of the antitrust laws. Such evidence would consist, for example, of a recorded phone call in which two competitors agreed to fix prices at a certain level.” Mayor and City Council of Baltimore v. Citigroup, Inc., 709 F.3d 129, 136 (2d Cir. 2013). Thus, in a case alleging a conspiracy to fix the price of bonds in the secondary market, compelling evidence included chats in which traders, acting on behalf of certain defendants, agreed “to fix prices at a specific level before bringing the bonds to the secondary market.” In re GSE Bonds Antitrust Litigation, 396 F. Supp. 354, 361 (S.D.N.Y. 2019). Direct evidence may also consist of “an explicit admission from a participant that an antitrust conspiracy existed.” In re Chocolate Confectionary Antitrust Litigation, 801 F.3d 383, 396 (3d Cir. 2015). For example, in In re High Fructose Corn Syrup Antitrust Litigation, 295 F.3d 651, 662 (7th Cir. 2002), the evidence of conspiracy included a statement of one of the defendant's plant managers that “[w]e have an understanding within the industry not to undercut each other's prices.”
Of course, such “ ‘smoking guns’ are rare in antitrust conspiracy cases.” In re Flat Glass Antitrust Litigation (II), 2012 WL 5383346, at *4 n.3 (W.D. Pa. Nov. 1, 2012). And, indeed, they would presumably be rare in any set of review documents collected by antitrust defendants. Consequently, it is critical not to exclude the possibility of locating such evidence by, for example, utilizing search terms that do not include words that might reflect a conspiratorial communication. The challenge is to do so without making the terms so broad that they encompass vast numbers of irrelevant documents. At the same time, broad validation statistics such as recall, standing alone, are of limited utility in ascertaining whether a party has done a reasonable job of searching for such rare documents.
*7 “Because direct evidence, the proverbial ‘smoking gun,’ is difficult to come by, plaintiffs have been permitted to rely solely on circumstantial evidence (and the reasonable inferences that may be drawn therefrom) to prove a conspiracy.” Intervest, Inc. v. Bloomberg, L.P., 340 F.3d 144, 159 (3d Cir. 2003) (internal quotation marks and citation omitted). Thus, “plaintiffs basing a claim of collusion on inferences from consciously parallel behavior [may] show that certain ‘plus factors’ also exist.” In re Flat Glass Antitrust Litigation, 385 F.3d 350, 360 (3d Cir. 2004). Such factors include evidence that competitors knew about the timing and amount of competitor price increases before the market did and “believed the market would not support a price increase” but nonetheless raised prices within eight days of each other, id. at 366; evidence that competitors “talk every week” and “had developed a good working relationship,” In re EDPM, 681 F. Supp. 2d 141, 173 (D. Conn. 2009); evidence that “the defendants’ executives engaged in friendly and frequent communications with each other during which they discussed issues such as ... price ... and market shares,” id. at 177; evidence of discussions of price and the timing of price increases, including “(1) who communicated with whom; (2) when those communications occurred and the type of information exchanged; (3) how the exchanges occurred; and (4) the legitimate reasons that may be offered for exchanging the information.” In re Polyurethane Foam Antitrust Litigation, 152 F. Supp. 3d 968, 983, 989 (N.D. Ohio 2015) (emphasis omitted); and evidence that frequent phone calls after negotiations “represented a departure from the ordinary pattern,” United States v. Apple Inc., 952 F. Supp. 2d 638, 655 n.14, 658 (S.D.N.Y. 2013).
In contrast to such smoking gun evidence, the concern with respect to circumstantial evidence is not that it will be missed altogether, but that an insufficient quantum will be uncovered to warrant the inference that an antitrust conspiracy exists. Even the most rudimentary search is likely to identify evidence of some communications among competitors, some complaints by customers about pricing, some evidence of meetings at trade shows, and so on. The issue is whether this evidence will amount to the critical mass necessary for the antitrust plaintiff to carry its burden. The plaintiff is, in effect, creating a mosaic, drawing on disparate pieces of evidence, no one of which is conclusive. Accordingly, the more circumstantial evidence it can present, the more complete the picture becomes. Here, volume matters, and statistics such as recall are more useful in determining the reasonableness of the search protocol. Those statistics will not be determinative, however, if the requesting party can demonstrate that certain types of circumstantial evidence have been systematically omitted.
As has been made clear in my prior reports and recommendations and in the Court's previous decisions, the adequacy of a search must be assessed in terms of both quantity – how the search scores with respect to metrics such as recall – and quality – the importance to the litigation of the documents that have been left behind. Here, as will be seen, the documents excluded by the defendants’ search terms and the responsive documents identified at the tail end of the TAR reviews are comparable in their significance: they are relevant to important issues in the litigation but similar to other documents already produced. On the other hand, the pertinent quantitative measures show that while the application of search terms has generally been reasonable, the termination of TAR review has not.
B. Search Terms
Insofar as the defendants contend that the Court has already determined that their search terms are reasonable (Def. Memo. at 6), they are incorrect. To be sure, in adjudicating the parties’ responses to the First R&R, the Court stated that “Defendants are not compelled to adopt the Plaintiffs’ search terms,” and directed the defendants to “proceed as they have outlined in their submissions.” (First Opinion at *1). But this did not immunize the defendants’ search terms from further review after the terms had been applied and validation analyses had been performed. The Court made this explicit in the Second Opinion, stating that “at the appropriate time, Plaintiffs will be permitted to challenge search term inadequacies they believe rendered the search unreasonable.” (Second Opinion at *2). That time has come.
*8 The plaintiffs mount a three-pronged attack on the defendants’ chosen search terms. First, they argue that the terms have resulted in the exclusion of an unreasonable number of responsive documents. Next, the plaintiffs advance the related argument that the defendants’ validation statistics are invalid. Finally, they identify specific categories of documents that they say were omitted by the defendants’ search terms.
The plaintiffs assert that “by their own count Defendants admit they are withholding 180,000 responsive documents.” (Pl. Memo. at 1; LaFreniere Decl. ¶ 29). They go on to argue that, of this number, the defendants’ “unduly narrow selection of search terms ... resulted in 146,000 responsive documents being left behind.” (Pl. Memo. at 13; LaFreniere Decl. ¶ 56). This argument is unpersuasive. The characterization of the defendants as withholding documents is disingenuous. It would be an apt description if the defendants had identified specific responsive documents and failed to turn them over, but that is not the case. Rather, it is estimated that approximately 180,000 documents that have not been identified are likely responsive, of which 146,000 were excluded at the search term phase. Semantics aside, the absolute number of unidentified responsive documents is not particularly meaningful, since the larger the total collection, the larger the number of responsive documents that will be left behind even after a reasonable search.
In their moving brief, the plaintiffs acknowledge the defendants’ “purportedly high recall (ranging from 74%-89%)” (Pl. Memo. at 13), and they then go on to engage in a qualitative analysis of the types of documents that they contend the defendants failed to retrieve (Pl. Memo. at 16-21). In their reply memorandum, however, the plaintiffs argue that the defendants’ recall statistics are overstated. (Plaintiffs’ Reply Memorandum of Law in Further Support of Plaintiffs’ Motion to Compel Search Terms and Further TAR Review (“Pl. Reply”) at 9-10). They present a declaration from their expert, Dr. Maura R. Grossman, who opines that the defendants’ recall estimates are “overly rosy” because they were not calculated using blind, stratified samples. (Third Supplemental Declaration of Maura R. Grossman in Support of Plaintiffs’ Motion to Compel Search Terms and Further TAR Review (“Grossman 3rd Decl.”) (ECF 774) ¶¶ 29, 40).[3] According to Dr. Grossman, the problem arises because qualified independent reviewers disagree about the proper coding of a document about 30% of the time. (Grossman 3rd Decl. ¶ 29). She maintains that “Defendants’ method of estimating recall uses Reviewer A's coding to estimate Reviewer A's recall,” thus inflating the recall statistic. (Grossman 3rd Decl. ¶¶ 31-32). Dr. Grossman also contends that bias was introduced by having reviewers evaluate an elusion sample that they would know consisted of documents already designated by other reviewers as non-responsive. (Grossman 3rd Decl. ¶ 33).
However accurate these criticisms may be, they do not establish a basis for rejecting the defendants’ search terms for failure to meet quantitative standards. As the defendants’ calculated, they achieved between approximately 74% and 89% recall. As the plaintiffs themselves wrote, “70-80% recall is often considered reasonable by courts.” (Letter of Sarah R. LaFreniere dated March 4, 2021 (ECF 458-8), attached as Exh. G to Declaration of Sarah R. LaFreniere dated March 25, 2021 (ECF 458) at 1). There is no suggestion, either in the plaintiffs’ letter or in the cases to which the parties allude, that the range of 70%-80% recall will only be acceptable if the producing party calculates it using the demanding standards that the plaintiffs advocate here. The defendants’ methodology may be imperfect, and it may result in higher estimated recall figures than if the plaintiffs’ approach were used, but it is not unreasonable, particularly given the extent by which the defendants exceeded the lower end of the acceptable range.
*9 Recall, however, is only one indicator of the adequacy of a search, and it is therefore important to turn to the plaintiffs’ assertion that the defendants’ search terms broadly excluded consequential documents. The plaintiffs approach this issue in two ways. First, they identify what they consider important documents that were not retrieved from the defendants’ searches but were instead located in their null sets. Second, the plaintiffs describe search terms that the defendants refused to run and suggest hypothetical documents that such terms would have located.
I will address first the documents that the defendants’ search terms did not retrieve but which were located during the validation process. I will focus on the examples provided by the plaintiffs in their briefing because these are the documents the plaintiffs presumably consider most consequential. Which defendant possessed the document that has now been discovered in the null set is not significant, since the same search terms were generally applied across each defendant's collection, and the plaintiffs seek to have their suggested terms applied by all defendants.
The plaintiffs specifically identify the following documents as having been excluded by the defendants’ search terms: COV000665154 (Declaration of Camila Ringeling dated Aug. 9, 2022 (“Ringeling Decl.”) (ECF 708), Exh. 3); COV000665093 (Ringeling Decl., Exh. 4); DOW-MDI-01869824 (Ringeling Decl., Exh. 5); WCA_CIV-000930942 (Ringeling Decl., Exh. 6); and BC-1666694 (Ringeling Decl., Exh. 7). (Pl. Memo. at 8-9).
The first of these documents, COV000665154, is a Covestro price list for a specific customer. (Pl. Memo. at 8-9; Declaration of John Terzaken dated Aug. 30, 2022 (“Terzaken Decl.”) (ECF 736) ¶ 25(a)). It is plainly relevant, but an identical copy was previously produced by Covestro. (Terzaken Decl. ¶ 25(a) & Exh. K). In addition, Covestro produced many other price lists in the relevant time period for this customer. (Terzaken Decl. ¶ 25(a)).
Document COV000665093 is a pricing proposal for a different Covestro customer. (Pl. Memol. at 9; Terzaken Decl. ¶ 25(b)). Again, it is relevant, but again Covestro produced it previously, this time as part of the production of documents collected for the Department of Justice. (Terzaken Decl. ¶ 25(b) & Exh. L).
According to the plaintiffs, DOW-MDI-01869824 is “an internal email string showing a defendant cannot supply product to a customer due to a force majeure, demonstrating the direct impact of the shutdown on customers and the market.” (Pl. Memo. at 9). On its face, this document relates to a potential new customer rather than an existing one. Be that as it may, Dow previously produced other documents addressing the supply constraints created by the same force majeure event. (Declaration of Vanessa Barsanti dated Aug. 30, 2022 ¶ 20 & Exhs. D, E).
Document WCA_CIV-000930942 is an email string in which an WCA employee discusses with a customer the tight supply of its MDI product. (Pl. Memo. at 9). WCA produced the same emails in a less inclusive email string (Declaration of Alden R. Atkins dated Aug. 30, 2022 ¶ 31 & Exh. 9).
Finally, document BC-1666694 is an email in which a BASF employee asks, “Are we telling customers we are taking a TAR in June?”, where “TAR” is, in this instance, shorthand for a “turnaround,” in which a company takes a factory or supply line offline. (Pl. Memo. at 9). The plaintiffs contend that “Defendants failed to include commonly used acronyms or codewords like ‘TAR’, despite Plaintiffs’ requests.” (Pl. Memo. at 9). But the plaintiffs never asked BASF to run “TAR” as a search term. (Declaration of Rachel J. LaMorte dated Aug. 30, 2022 (“LaMorte Decl.”) ¶ 28). Nevertheless, it ran search strings that included “turnaround” and other related words and produced multiple documents related both to the same turnaround event and to other similar events. (LaMorte Decl., ¶¶ 27, 29, 30 & Exhs. 39-54).
*10 The plaintiffs identify as critical three other documents that the defendants initially failed to produce, but as to which it is ambiguous whether the plaintiffs located them in the defendants’ null sets or in the unreviewed TAR populations. These are BC-1566238 (Ringeling Decl., Exh. 23); BC-1527606 (Ringeling Decl., Exh. 24); and WCA_CIV-000930776 (Supplemental Declaration of Sarah R. LaFreniere dated September 20, 2022) ((“LaFreniere Supp. Decl.”), Exh. 8). (Pl. Memo. at 17, 20; Pl. Reply at 7). Because the plaintiffs do not seek to have BASF resume its TAR search, the BASF documents would only be relevant to the plaintiffs’ search term arguments and therefore presumably came from its null set. I will likewise assume that the WCA document was found in its null set and relates to the search term dispute as well.
Document BC-1566238 is an internal BASF email instructing employees to update certain orders to reflect higher prices “due to allocation issues and price increases.” While this email may have been filtered out by defendants’ search terms, those terms did identify multiple documents related to pricing. (LaMorte Decl., ¶ 40 & Exhs. 16-38).
The plaintiffs argue that BC-1527606 is highly probative because it reflects a collegial relationship between BASF and Dow personnel who joke about election results and because it includes reference to an upcoming discussion about pricing. (Pl. Memo. at 20; Pl. Reply at 5). But Dow is a customer of BASF, and BASF produced a subsequent email concerning the anticipated discussion, which included as an attachment a spreadsheet with the historical pricing between BASF and Dow. (LaMorte Decl., ¶ 43 & Exhs. 11, 12).
The plaintiffs contend that WCA_CIV-000930776, which is an email from a customer to WCA attaching Huntsman's price list, illustrates a “[c]ustomer's willingness to purchase competing Defendants’ products, and documents regarding swaps demonstrate that these products are interchangeable (a disputed fact).” (Pl. Reply Memo. at 7). WCA has, in any event, produced other examples of customers sharing one defendant's prices with a different defendant, and the plaintiffs themselves have produced similar documents, since, as customers, they have used one defendant's pricing as leverage to negotiate with another. (Atkins Decl., ¶ 34 & Exhs. 11, 12).
On balance, the documents that the plaintiffs selected from the null set to demonstrate flaws in the defendants’ search terms do not make the case. Some are duplicates or near-duplicates of documents otherwise produced by the defendants. Of course, this does not demonstrate the accuracy of the defendants’ search terms: those terms still missed documents that were located solely, for example, in the Department of Justice production. More importantly, though, none of the documents that the plaintiffs identify are individually consequential. While they may be germane to important issues in the case, they are not unique. Rather, they are, like similar documents already produced on the basis of the defendants’ search terms, part of the mosaic that the plaintiffs are assembling. They are not significant enough to warrant re-starting the review process to incorporate the disputed search terms.
I will now turn to the plaintiffs’ argument that, apart from what the documents from the null set show, the defendants’ search terms necessarily excluded important categories of documents that would be captured by the plaintiffs’ terms. The plaintiffs provide three charts to illustrate their argument. Exhibit A lists all of the disputed search terms and, for each defendant, identifies the number of hits each search term had within the null set, how many hits it had within the sample derived from the null set, and how many of the hits from the null set sample were actually responsive. (Pl. Memo., Exh. A). Exhibit B is a chart of “phrases” that the plaintiffs derived from the disputed terms to illustrate the types of communications that they contend were excluded by the failure to use these terms. (Pl. Memo., Exh. B). For example, disputed search string 57.a is: (market* OR Mkt OR industry OR player* OR maker* OR supplier) /10 (add* OR adjust* OR assess OR balance* OR behave* OR chang* OR challeng* OR concern* OR condition* OR confirm* OR coordin* OR disrupt* OR disturb* OR declin* OR decreas* OR destroy* OR discuss* OR discliplin* OR disrupt* OR divid*).[4] According to the plaintiffs, the failure to accept this term meant that the defendants would not search for phrases reflecting an antitrust conspiracy such as “divide the market” or disruptive player” among others. (Pl. Memo., Exh. A at 1). By the plaintiffs’ reckoning, “Exhibit B tells the tale: Defendants did not run these search terms; they did not feed documents that hit on these terms into their TAR tools; and they left documents with these highly relevant search terms out of their review process entirely.” (Pl. Memo. at 8).
*11 In my view, it is Exhibit C that tells the tale. That exhibit lists documents from the defendants’ production that the disputed terms hit. In other words, these documents contain language that was hit by both the disputed terms and the terms that the defendants actually used. (Pl. Memo. at 9-10). The plaintiffs reason that this demonstrates “simply that Defendants omitted search terms that are not only likely to uncover responsive documents – but in fact do so.” (Pl. Memo. at 10). The better conclusion is that the defendants’ search terms were robust enough to capture many of the same documents that the disputed terms would have, without bringing in volumes of additional non-responsive documents. Indeed, a review of the description of the documents identified in Exhibit C shows that the defendants’ terms did not exclude key concepts in this antitrust case such as “communications and evidence of meetings with one another,” “trade association events,” “pricing strategies,” “customer price lists,” “strategies concerning inventory and supply,” “[t]erms related to market behavior, pricing and supply,” and “[t]erms related to pricing, costs, and customers.” (Pl. Memo. at 2, 18, 21). This is reinforced, for example, by WCA's analysis showing the numerous search terms that it used which encompassed each of these key concepts. (Atkins Decl., Exh. 1).
This does not mean that the disputed search terms would not yield many additional responsive documents; they would. It does not mean that the defendants’ search terms are perfect; they are not. But the parties agreed that the defendants could use search terms to narrow the population of documents to be presented to the TAR tools (Declaration of Zachary K. Warren dated April 9, 2021 (ECF 472), Exh. 1), and the search terms utilized by the defendants have not been demonstrated to be unreasonable. The evidence shows that the documents that they missed were probably not of high value, and the search terms adequately captured the critical issues that the plaintiffs have identified. I therefore recommend that the plaintiffs’ motion be denied insofar as it seeks to require the defendants to adopt the disputed search terms.
C. TAR
The plaintiffs also challenge the decisions of WCA, Huntsman, and Covestro to cease their TAR review procedures when they did. As discussed in the previous reports and recommendations, each defendant used continuous active learning, or CAL. With this methodology, reviewers code documents presented by the TAR tool as either responsive or non-responsive and feed them back to the classifier, providing the tool with information that it then uses to assign a score to each document within the review set reflecting a prediction of the likelihood that the document is responsive. “This means that once the TAR classifier has learned enough to score documents, most documents reviewed in the early stage of the review are ranked highly responsive, and there are more of them, and as the review continues, fewer and fewer responsive documents are indicated, and their ranking is lower and lower.” (First R&R at 3 (quoting Declaration of Daniel L. Regard II dated April 9, 2021 (ECF 471-2), ¶ 26(b)(i), (ii))). See In re Valsartan, Losartan, and Irbesartan Products Liability Litigation, 337 F.R.D. 610, 614 (2020) (describing continuous active learning methodology).
Initially, the defendants proposed to stop their TAR review when the number of documents they had identified as responsive surpassed 70% estimated recall. (First R&R at *2-3). The plaintiffs objected and instead proposed that the defendants’ review continue until “the last two batches of documents identified by TAR and reviewed by humans contains no more than five to ten percent (5%-10%) responsive documents, and none of the responsive documents is novel and/or more than marginally relevant.” (First R&R at *4 (quoting Plaintiffs’ Proposed Stopping Criterion and Validation Process for Defendants’ Application of Technology Assisted Review (ECF 460-3) at 2)). After I indicated in the First R&R that the defendants’ proposed stopping criterion was unreasonable because it failed to consider the quantity and quality of the documents being reviewed at the margin (First R&R at *9), the defendants agreed to modify their procedures by identifying for the plaintiffs the responsive documents retrieved from the last two batches of the CAL review. (Def. Resp. to First R&R at 11). This proposal did not include specific metrics for deciding when the review had been adequate, but the Court directed the defendants to proceed as they suggested.
*12 The small sample of the responsive documents from the last two batches that the parties have proffered do not appear to be materially different in kind from the responsive documents located in the null set. They are plainly relevant to the claims asserted but are not unique. Rather, they are pertinent to issues that other documents produced by the defendants also relate to. For example, in document WCA_CIV-000930650 (Ringeling Decl., Exh. 20), a customer complains that “the current pricing we are receiving [for MDI] is not realistic in the market and we are unable to work with them.” Similarly, in the email exchange in WCA_CIV-000930721 (Ringeling Decl., Exh. 19), a customer inquires about supply of MDI in the United States and is told that WCA is sold out and will have limited supply for the next few months. In HTN-00520415 (Ringeling Decl., Exh. 16), employees of Huntsman and WCA exchange emails concerning a swap transaction for MDI. Document HTN-00549393 (Ringeling Decl., Exh. 17) references a meeting between employees of Huntsman and Dow concerning MDI purchases. Document COV000599124 (Ringeling Decl., Exh. 16) is a series of emails between Covestro and Dow concerning payment issues, presumably related to swap transactions. Finally, document COV000664772 (Ringeling Decl, Exh. 18) is a group of emails relating to a force majure event at a Covestro plant in Europe and its impact (or lack of impact) on the supply of MDI in the United States.
The sample documents pulled from the last two batches are thus not dramatic enough to warrant further review solely based on the qualitative prong of the analysis. However, at the time that WCA stopped its review, 15% of the documents being reviewed were identified as responsive. (Figure 1). That is approximately one out of every 6.7 documents. Huntsman stopped its review when its TAR tool was still returning batches in which 18% or approximately one out of every 5.5 was responsive. (Figure 1). And, when Covestro stopped reviewing, the last batches contained 19% responsive documents, or about one out of every 5.3. (Figure 1). On its face, it seems surprising to halt a review when responsive documents are being returned with such frequency. Indeed, stopping when they did likely contributed to the fact that the overall estimated recall for these three defendants is lower than that for BASF or Dow. (Figure 4). Furthermore, because large swaths of documents had already been excluded by search terms, it is particularly important not to stop the review of the remaining documents prematurely.
While the documents retrieved from the last two TAR batches are not entirely novel and would not justify implementing a new set of search terms, they are sufficiently important to require WCA, Huntsman, and Covestro to continue their review of documents already identified as potentially responsive based on their own search terms. The volume of communications reflecting certain subjects – customer complaints about price and supply, communications between competitors, supply disruptions – provides circumstantial evidence that permits the plaintiffs to fill in their mosaic and are significant even if individual examples of such documents appear trivial. Even if near duplicates of some of these documents have already been produced (Declaration of Zachary K. Warren dated Aug. 30, 2022, ¶¶ 20, 21), the fact that these documents were identified at this point in the TAR process suggests that other similar documents, some of which may be unique, remain in the unreviewed TAR population. Similarly, the fact that a document may refer to a force majure event having no impact on supply and therefore has no intrinsic relevance to the plaintiffs’ claims (Terzaken Decl. ¶ 26(b)) also means that a continued review could well locate documents in the next few iterations that discuss an event that does constrict supply of MDI and therefore is consequential.
In their brief, the defendants argue generally that granting the relief requested by the plaintiffs would impose a disproportionate burden on them. (Def. Memo. at 27-29). They note that since they completed their production and disbanded their review teams, they would face re-staffing costs. Overall, they assert that the additional costs would include vendor processing, loading, hosting, and searching the relevant data; hiring, running conflicts checks for, and training a new slate of contract document reviewers (who would be less efficient at the start as they would be new to the case); review of hundreds of thousands of additional documents; renewed prevalence testing, subject matter expert second-level review and quality control testing; and additional validation efforts, including elusion testing and reporting of TAR metrics. (Def. Memo. at 27-28). Many of these costs relate primarily to the specter of having to run additional search terms and then incorporate additional documents into the TAR process. Those costs would not be incurred in connection with restarting the TAR review alone. Furthermore, it is not appropriate to consider costs such as retraining a review team, since these defendants could have avoided those costs had they not aborted the review process too soon.
*13 The only defendant to have estimated its costs specifically in relation to continuing the TAR review process is Huntsman. (Warren Decl., ¶¶ 27-30). It estimates that the cost of assembling and training a new team of contract attorneys to perform the review would be $15,000. (Warren Decl., ¶ 29). Even if this were an expense appropriately taken into account, it is de minimis in relation to other costs of discovery in this case. More significant is Huntsman's estimate that it would cost between $50,000 and $100,000 per week to continue the TAR review once the team is up and running. (Warren Decl., ¶ 30). While little detail is offered to support these figures, I will accept them for purposes of a proportionality analysis.
Such an analysis must consider all of the factors set forth in Rule 26(b)(1) of the Federal Rules of Civil Procedure that may be relevant to the dispute at hand: (1) the importance of the issues at stake in the action, (2) the amount in controversy, (3) the parties’ relative access to relevant information, (4) the parties’ resources, (5) the importance of the discovery in resolving the issues, and (6) whether the burden or expense of the proposed discovery outweighs its likely benefit. See In re Diisocyanates Antitrust Litigation, 2020 WL 7427040, at *5 (W.D. Pa. 2020) (applying all factors); Guadalupe v. City of New York, 2016 WL 3570540, at *2 (S.D.N.Y. June 24, 2016) (holding that “a party seeking to avoid discovery on grounds of burden at least has the obligation to address those proportionality factors pertinent to that case and provide specific evidence and argument about them to the extent possible”). Here, the issues at stake warrant broad discovery: the antitrust laws were enacted by Congress to protect the integrity of the markets. See Northern Pacific Railway Co. v. United States, 356 U.S. 1, 4 (1958). The amount in controversy is in the tens if not hundreds of millions of dollars. The defendants have exclusive possession of the overwhelming bulk of relevant information and so may be expected to bear the greater burden in discovery. All parties to this case have substantial resources to devote to discovery. The types of information the plaintiffs seek bear directly on issues central to the case. Finally, while the additional discovery sought from these defendants does not consist of information that is likely to differ in kind from what the plaintiffs already have, it is potentially important in allowing them to build their case, and the benefit therefore outweighs the burden.
On balance, then, the proportionality factors favor requiring WCA, Huntsman, and Covestro to continue their TAR review. To reduce the probability of further dispute and to provide the parties with a benchmark, I recommend that each review may presumptively be terminated when the last two batches reviewed contain no more than 10% responsive documents. Since the quality as well as the quantity of the documents retrieved remains important, this does not preclude either party from arguing that the process should halt earlier or continue longer based on the nature of the documents in those batches. It does, however, establish a hurdle that the party seeking to require a different stopping point must overcome. As was done previously, these defendants should provide the plaintiffs with the Bates numbers of the responsive documents found in the last two batches.
D. BASF
The plaintiffs present a unique argument with respect to BASF. They acknowledge that “the quantity and quality of BASF's validation documents do not suggest that the review terminated too early.” (Pl. Memo. at 12). Nevertheless, they cite examples of specific documents from the last two batches reviewed that relate to important issues like the monitoring of competitor pricing and supply constraints created by a force majure event. (Pl. Memo. at 13). Consequently, the plaintiffs ask that BASF be required to conduct targeted searches to identify documents in these categories. There is no basis for this relief. As noted above, the specific BASF documents identified by the plaintiffs are not in themselves remarkable. Furthermore, BASF has demonstrated that it has produced substantial numbers of documents in the very categories for which the plaintiffs now seek a “targeted search.” (LaMorte Decl., ¶¶ 48-49 & App. A, B, C, D). No unique remedy is warranted with respect to BASF.
Conclusion
*14 For the reasons set forth above, I recommend that the plaintiffs’ motion to compel be granted to the extent that WCA, Huntsman, and Covestro be required to resume their TAR review, presumptively continuing until the last two batches reviewed contain no more than 10% responsive documents, with the parties permitted to argue that the review should be stopped sooner than that or continued longer on the basis of the significance of the documents obtained from those batches. In all other respects, I recommend that the plaintiffs’ motion be denied.
Footnotes
Citations to my reports and recommendations and to the Court's opinions use the pagination from the versions published in Westlaw.
I note that independent of the dispute over search terms, the plaintiffs have moved for the production of the calendars. (ECF 788, 789). I offer no view on the merits of that motion as it has not been referred to me.
Because this declaration and the associated arguments were not presented until the plaintiffs submitted their reply papers, it would be within my discretion to discount them altogether. See Meditz v. City of Newark, 658 F.3d 364, 367 n.1 (3d Cir. 2011) (upholding court's discretion not to strike certification provided for first time on reply). I will nevertheless address them on the merits.
The term “OR disrupt*” appears twice in this search string, but this is inconsequential, since both instances are within a segment where all terms are separated by disjunctive connectors. In other words, the search string “A OR B” would return the same documents as “A OR B OR A.”