Demarchi, Virginia K., United States Magistrate Judge
ORDER RE MARCH 1, 2022 DISCOVERY DISPUTE RE GOOGLE'S ESI REVIEW
Re: Dkt. No. 157
The parties ask the Court to resolve several disputes regarding the search terms, custodians, and non-custodial sources to be used in collection and review of Google’s electronically stored information (“ESI”). Dkt. No. 157. The Court held a hearing on the dispute on March 8, 2022. See Dkt. Nos. 164, 174 (public transcript), 175 (sealed transcript). At the Court’s direction, the parties submitted additional briefing regarding search terms and proposed custodians on March 15 and 18, 2022. See Dkt. Nos. 165, 173, 176.
Having considered the parties’ written submissions and the arguments presented at the hearing the Court resolves the parties’ disputes as follows:
I. SEARCH TERMS
A. Proximity limiters
The parties dispute which proximity limiters should be used in search strings. Plaintiffs propose to use /50, or in some cases /35, while Google proposes to use /25, or in some cases /15. Selecting an appropriate proximity limiter is a context-specific exercise, dependent on the nature of the document to be searched and its subject matter. So far as the Court is aware, there is no generally accepted rule of thumb or default, although each side cites authority in support if its preferred proximity limit. See, e.g., Dkt. No. 173-1 (term 2). If one assumes that the average English sentence is approximately 15-20 words—and the average email sentence may be even fewer words—then a proximity limiter of /35 will generally capture words used within two sentences of each other. For most of the parties’ disputes, /35 will be an appropriate default—i.e., the Court expects it will produce a collection of documents for review that is proportional to the needs of the case. However, for some search strings a narrower or broader proximity limiter may be appropriate.
Please see Exhibit A for the Court’s resolution of the parties’ disputes regarding proximity limiters. If, for a given term, a party believes that the selected proximity limiter produces a review set with too many false positives or fails to adequately identify responsive documents, the party should raise its specific concerns with the other party, and the parties must confer regarding whether a further adjustment to the search string is warranted.
B. Code names[1]
Zwieback/Zweiback. Google says that “Zwieback” refers to cookies used in Google.com pages, whereas RTB operates only on third party websites. Plaintiffs provide no support for their assertion that “Zwieback” is relevant to any issue in dispute. The Court adopts Google’s proposal to omit this term.
[REDACTED] Plaintiffs argue that [REDACTED] and [REDACTED] refer to a project to unify data collection about google account holders, including for purposes of RTB. Google says that [REDACTED] and [REDACTED] refer to distinct projects or consent flows, neither of which concern RTB; however, Google has agreed to include in some search strings. The Court is persuaded that [REDACTED] in close proximity with RTB and like terms will not produce an unreasonable number of additional false positive hits, and so adopts plaintiffs’ proposal.
[REDACTED] Plaintiffs argue that [REDACTED] refers to a component of Google's data flow system which bundles information received from account holders’ devices and their online activities to create a bid request. Google objects to including [REDACTED] in a search string because it is not specific to RTB. Assuming plaintiffs are correct about what [REDACTED] is, then the Court is persuaded that [REDACTED] in close proximity with RTB and like terms will not produce an unreasonable number of false positive hits, and so adopts plaintiffs’ proposal.
The Court’s decisions on the search strings that include these code names are indicated in Exhibit A.
C. Other terms
The Court agrees with Google that the generic term “category” is unlikely to generate hits on relevant documents, even when combined with other terms, and particularly where other terms are more likely to be used when referring to relevant information, as indicated in Exhibit A.
Conversely, the Court agrees with plaintiffs that the term “consent” is likely to generate hits on relevant documents, but only when used with a proximity limiter in combination with other search terms, as indicated in Exhibit A.
Plaintiffs ask Google to apply a number of standalone terms. For many of these, Google objects that the terms are too generic or ubiquitous to serve any useful function as search terms for identifying potentially responsive documents. The Court agrees with Google that these standalone terms appear likely to generate a disproportionately large number of false positive hits. Google also objects that other search strings are unnecessarily duplicative of search strings that the parties have agreed to apply. The Court agrees with Google that many search strings are duplicative of agreed search strings or search strings that the Court has decided should be applied, and it would be unnecessarily burdensome to require Google to apply these duplicative search strings. The Court has indicated its decision regarding these proposed search strings in Exhibit A.
II. CUSTODIANS
The parties have agreed that Google will search the files of 21 custodians. See Dkt. No. 176-1 at 1-2. The parties dispute whether Google should also be required to search the files of an additional 21 custodians. See id. at 3-16. The Court has considered the parties’ supplemental submission about the disputed custodians as well as the discussion at the hearing, and concludes that Google should search the files of the following disputed custodians:
- Greg Fair
- Osvaldo Doederlein
The Court finds that the remaining disputed custodians are unlikely to have unique, nonduplicative materials or that their involvement in the subject matter at issue in this case is limited, such that requiring Google to search their ESI files by applying search terms would not be proportional to the needs of the case. However, it may be necessary and appropriate for Google to conducted targeted searches in a particular custodian’s files if he or she is known to have responsive documents. See, e.g., Dkt. No. 184 at 2, 6 (discussion of RFPs 1, 10-11). Nothing in this order relieves Google of that obligation.
III. NON-CUSTODIAL SOURCES
Plaintiffs complain that Google has not adequately identified its sources of potentially responsive non-custodial ESI. Dkt. No. 157 at 5. Google responds that it has provided a disclosure of 23 categories of documents it expects to produce from non-custodial sources. Id. at 8. The Court has reviewed Exhibit C to the parties’ joint submission. While it appears that Google has not disclosed the specific names of the internal repositories, drives, web pages, or systems it intends to search, it has disclosed what kinds of information those sources contain. Plaintiffs do not explain why this disclosure is insufficient or why they require the internal names of these non-custodial sources in order to evaluate whether Google’s identification of such sources is complete. For this reason, the Court will not require Google to make a further disclosure regarding these non-custodial sources at this time.
IT IS SO ORDERED.
Dated: April 29, 2022