This case illustrates some of the practical difficulties in implementing three of the principles intended to reduce the burden of discovery generally, and electronic discovery in particular: phasing, sampling, and proportionality. In this putative class action, the plaintiffs allege that their employers, Goldman, Sachs & Co. and The Goldman Sachs Group, Inc. (collectively, “Goldman Sachs”), engaged in a pattern of gender discrimination against female professional employees in violation of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq., and the New York City Human Rights Law, N.Y.C. Admin. Code § 8–107 et seq. Specifically, the plaintiffs contend that they have been discriminated against in evaluation, compensation, and promotion. The plaintiffs seek to represent “a Class of all female financial-services employees who are at the Associate, Vice President, and Managing Director corporate level” at Goldman Sachs. (First Amended Class Action Complaint (“Am.Compl.”), 58). They now move pursuant to Rule 26(b) of the Federal Rules of Civil Procedure for an order compelling Goldman Sachs to produce (1) computerized compensation, promotion, and performance evaluation data from 2002 to the present which are contained in databases, and (2) non-database materials including policy and complaint documents from July 2000 to the present.
The plaintiffs are three women who worked for Goldman Sachs between 1997 and 2008. (Am.Compl., 12–18). Goldman Sachs hired H. Cristina Chen–Oster in March 1997 as a salesperson in the Convertible Bonds Department, a unit of the Securities Division, and she was promoted to Vice President the following June. (Am.Compl., 67; Letter of Theodore O. Rogers, Jr. dated Sept. 1, 2005 (“Def. 9/1/05 EEOC Letter”), attached as Exh. B to Declaration of Theodore O. Rogers, Jr. dated July 25, 2011 (“Rogers 7/25/11 Decl.”), at 2). She transferred to the Synthetics Convertibles group in 2002 and ultimately resigned from the firm in 2005, having remained in the position of Vice President. (Am.Compl., 67, 89, 99; Def. 9/1/05 EEOC Letter at 3–4). Shanna Orlich began work as a Summer Associate at Goldman Sachs in 2006 and became a full-time Associate in the Capital Structure Franchise *296 Trading Group, another unit of the Securities Division, in July 2007. (Am.Compl., 112). She was terminated from that position in November 2008. (Am.Compl., 131). Goldman Sachs hired Lisa Parisi as a Vice President in the Asset Management Division in August 2001. (Am.Compl., 101). She was promoted to the position of Managing Director in 2003 and continued in that capacity until Goldman Sachs terminated her employment in November 2008. (Am.Compl., 101, 110).
Goldman Sachs maintains information relevant to the plaintiffs' requests for database information in four different systems.
PeopleSoft is Goldman Sachs' primary Human Resources database. (Affidavit of Cathy Obradovich (“Obradovich Aff.”), excerpts attached as part of Exh. 3 to Declaration of Barbara Brown dated May 30, 2012 (“Brown Decl.”), 2). In effect, it is two separate databases. The first contains data collected in and before September 2004. (Obradovich Aff., 2). Beginning in that month, Hewitt Associates LLC (“Aon Hewitt”) assumed responsibility for hosting the Goldman Sachs' information. (Declaration of Vishali Chandramouli dated May 30, 2012 (“Chandramouli Decl.”), attached as Exh. 4 to Brown Decl., 2). The new database retained more organizational information about employees than the older version had, and complete data were not migrated from the original database to the new one. (Obradovich Aff., 8). As a result, although data on the new PeopleSoft program for a current employee may include information about employment at Goldman Sachs prior to 2004, that information may be incomplete. (Deposition of Cathy Obradovich dated April 18, 2012 (“Obradovich Dep.”), excerpts attached as Exh. G to Declaration of Anne B. Shaver dated May 2, 2012 (“Shaver Decl.”), and as Exh. 2 to Brown Decl., at 42–43).
In addition, different protocols are used to extract information from the two systems. Reports can be obtained from the current PeopleSoft database using a tool known as Query Studio. (Obradovich Dep. at 48). This tool allows the user to choose relevant fields of information and use a “drag and drop” feature to incorporate the information into a report. (Obradovich Dep. at 53–55). By contrast, no such tool is associated with the pre-September 2004 database. As a consequence, any report derived from that database would have to be created by developing inquiries from scratch using SQL, a programming language. (Obradovich Dep. at 72–74).
The second relevant database is the Compensation Recommendation System (“CRS”), which tracks the results of the annual year-end compensation review process. (Obradovich Aff., 17). This database is used to generate Total Cost Reconciliation (“TCR”) files, which include various elements of employee compensation that add up to the employee's annual total compensation. (Obradovich Dep. at 116–17). According to Goldman Sachs, TCR reports were not generated prior to December 2005. (Def. Memo. at 10). Data can be extracted from the CRS system independent of the TCR reports, but, according to Goldman Sachs, any such project would require significant quality control efforts because the data is not maintained in a “user-friendly” format. (Obradovich Aff., 17–18).
The Firmwide Review System (“FRS”) database contains ratings, comments, and results from Goldman Sachs' employee performance evaluation program, known as the 360 review process. (Affidavit of Ankur Pathak dated March 30, 2012 (“Pathak Aff.”), attached as part of Exh. 3 to Brown Decl., 2; Deposition of Ankur Pathak dated April 19, 2012 (“Pathak Dep.”), portions attached as Exh. H to Shaver Decl., at 24). The FRS database contains no data for the period prior to January 1, 2003. (Pathak Aff., 2; Pathak Dep. at 30–31). The Talent Assessment Group (“TAG”) at Goldman Sachs does have electronic data concerning performance reviews in 2002, but that information is limited to the Equities section of the Securities *297 Division and may not be complete. (Pathak Aff., 1, 3). In addition, TAG does not possess the criteria or rating scales used to generate the 2002 evaluations. (Pathak Aff., 3).
Finally, Goldman Sachs utilizes something known as the MD Selection Database to track information pertinent to the consideration and selection of Vice Presidents for promotion to the position of Extended Managing Director (“EMD”). (Pathak Aff., 9). This database contains information from January 1, 2002, forward. (Pathak Aff., 9).
I will address additional facts as they are pertinent to the legal analysis.
From time to time, discovery in this case has been stayed by agreement of the parties or by court order pending decision on motions that would likely determine the scope of relevant information. For example, the parties agreed to hold in abeyance discovery on the claims of plaintiff Parisi while they litigated Goldman Sachs' motion to compel arbitration of her claims. (Letter of Adam T. Klein dated June 21, 2011 (“Klein 6/21/11 Letter”), attached as Exh. B to Shaver Decl., at 1 n. 1). That motion has since been denied. (Chen–Oster v. Goldman, Sachs & Co., 785 F.Supp.2d 394 (S.D.N.Y.2011) (denying motion); Chen–Oster v. Goldman, Sachs & Co., No. 10 Civ. 6950, 2011 WL 2671813 (S.D.N.Y. July 7, 2011) (denying reconsideration); Memorandum Endorsement dated Nov. 14, 2011 (denying appeal of denial of motion)). Subsequently, Goldman Sachs moved to strike the class claims of plaintiff Chen–Oster on the ground that she had not exhausted them in the preceding administrative proceedings. In response to that motion, I issued a partial stay of class discovery, holding that “[u]ntil the ability of plaintiffs Parisi and Chen–Oster to assert class claims is finally determined, class discovery shall be limited to the period after January 1, 2007, to the Securities Division, and to the positions of Associate and Vice President.” (Memorandum Endorsement dated Oct. 4, 2011). Thereafter, the motion to strike Ms. Chen–Oster's class claims was denied. (Chen–Oster v. Goldman, Sachs & Co., No. 10 Civ. 6950, 2011 WL 6372786 (S.D.N.Y. Sept. 29, 2011) (recommending denial of motion); Chen–Oster v. Goldman, Sachs & Co., No. 10 Civ. 6950, 2012 WL 76915 (S.D.N.Y. Jan. 10, 2012) (affirming report and recommendation)). Finally, Goldman Sachs moved to strike all class allegations on the ground that they are barred by the Supreme Court's determination in Wal–Mart Stores, Inc. v. Dukes, ––– U.S. ––––, 131 S.Ct. 2541, 180 L.Ed.2d 374 (2011). The Court recently denied that motion in substantial part. (Chen–Oster v. Goldman, Sachs & Co., No. 10 Civ. 6950, 2012 WL 205875 (S.D.N.Y. Jan. 19, 2012) (recommending denial of motion); Chen–Oster v. Goldman, Sachs & Co., 877 F.Supp.2d 113, 2012 WL 2912741 (S.D.N.Y.2012) (affirming report and recommendation except dismissing claims for injunctive relief)). No pending motions now impede proceeding with discovery.
The plaintiffs seek Goldman Sachs' compensation, promotion, and performance evaluation database information for the period from 2002 to the resolution of this lawsuit for its revenue-generating units, which are the Securities, Investment Banking, Investment Management, and Merchant Banking Divisions. They also request other documents responsive to their Requests for Production for the period from July 7, 2000 until this case is resolved. (Memorandum of Law in Support of Plaintiffs' Motion to Compel Discovery of Data and Documents (“Pl. Memo.”) at 24). They argue that this information is relevant both to class certification and to the merits, that it is necessary for any statistical analysis of Goldman Sachs' employment practices, that it is reasonably accessible, and that its disclosure involves no undue burden. (Pl. Memo. at 6–20). With respect to the information sought which is not contained in databases, the plaintiffs seek data going back to the year 2000, two years prior to the beginning of the class period, in order to analyze the development of the challenged policies. (Pl. Memo. at 20–21). They contend that the defendants must search for this information systematically, and not merely produce that which they find in the course of *298 seeking more recent materials. (Pl. Memo. at 21–23).
For its part, Goldman Sachs presents a two-tier argument. First, it maintains that, in light of Dukes, discovery of the statistical information in the databases should be deferred altogether until after deposition and other discovery is taken with respect to its policies concerning employee evaluation, compensation, and promotion. (Def. Memo. at 5–8). According to Goldman Sachs, if the non-statistical discovery were to reveal information insufficient to support certification, there would be no need to take discovery of the databases. (Def. Memo. at 8).
Second, Goldman Sachs argues that even if discovery of the databases is not deferred altogether, it should be limited. The defendants maintain that certain of the data sources are not reasonably accessible, and that searching them would involve undue burden and expense. (Def. Memo. at 11–13). With these considerations in mind, Goldman Sachs proposes to produce from the PeopleSoft database information from January 1, 2007 to December 2011 for Associates and Vice Presidents in the Securities and Investment Management Divisions, as well as information back to 2002 for the Convertible Sales and Equities Research groups within the Securities Division. (Def. Memo. at 10). With respect to the Compensation Recommendation System, Goldman Sachs would produce Total Cost Reconciliation reports for Associates and Vice Presidents in the Securities and Investment Management Divisions from December 2005 through December 2011. (Def. Memo. at 10). It would likewise produce information from the FRS database for Associates and Vice Presidents in the Securities and Investment Management Divisions from 2003 through 2011. (Def. Memo. at 11). And, it would disclose data from the MD Selection database for the Securities and Investment Management Divisions for some unspecified period. (Def. Memo. at 11).
Finally, Goldman Sachs does not object to producing policy-related documents dating back to 2000 as long as they are obtained from custodians whose files are otherwise being reviewed. It does object to conducting a systematic search for documents pre-dating 2005. (Def. Memo. at 21–22).
There is no doubt that Dukes has raised the bar that plaintiffs must clear in order to qualify for class certification. However, it does not, as Goldman Sachs suggests, militate in favor of bifurcating discovery prior to certification. On the contrary, if anything, Dukes illustrates the need to develop the record fully before a class motion is considered.
Dukes involved “one of the most expansive class actions ever.” Dukes, ––– U.S. at ––––, 131 S.Ct. at 2547. The plaintiffs, challenging gender discrimination in pay and promotion in violation of Title VII, sought certification of a class consisting of “all women employed at any Wal–Mart domestic retail store at any time since December 26, 1998, who have been or may be subject to Wal–Mart's challenged pay and management track promotions policies and practices.” Id. at ––––, 131 S.Ct. at 2549 (internal quotation marks, citation, and alteration omitted). The proposed class comprised about one and a half million current and former female employees. Id. at ––––, 131 S.Ct. at 2547.
The record showed that “[p]ay and promotion decisions at Wal–Mart are generally committed to local managers' broad discretion, which is exercised in a largely subjective manner.” Id. at ––––, 131 S.Ct. at 2547 (internal quotation marks and citation omitted). Nevertheless, the plaintiffs' “basic theory” was that “a strong and uniform ‘corporate culture’ permits bias against women to infect, perhaps subconsciously, the discretionary decisionmaking of each one of Wal–Mart's thousands of managers—thereby making every woman at the company the victim of one discriminatory practice.” Id. at ––––, 131 S.Ct. at 2548. In order to show that there were “questions of law or fact common to the class” as required for certification by Rule 23(a)(2) of the Federal Rules of Civil Procedure, the plaintiffs presented statistical evidence showing pay disparities *299 between male and female employees, anecdotal reports of discrimination, and expert testimony from a sociologist to the effect that Wal–Mart's “culture” and personnel practices rendered it vulnerable to gender discrimination. Id. at ––––, 131 S.Ct. at 2549.
The Supreme Court overturned the order granting class certification, finding that the requirement of commonality had not been met. It held that “[c]ommonality requires the plaintiff to demonstrate that the class members ‘have suffered the same injury.’ ” Id. at ––––, 131 S.Ct. at 2551 (quoting General Telephone Co. of Southwest v. Falcon, 457 U.S. 147, 157, 102 S.Ct. 2364, 72 L.Ed.2d 740 (1982)). In particular, the plaintiffs' claims “must depend on a common contention ... of such a nature that it is capable of classwide resolution—which means that determination of its truth or falsity will resolve an issue that is central to the validity of each one of the claims in one stroke.” Id. at ––––, 131 S.Ct. at 2551. The Supreme Court directed lower courts to engage in a “rigorous analysis” to determine whether the prerequisites for certification have been satisfied, an analysis that frequently “will entail some overlap with the merits of the plaintiff's underlying claim. That cannot be helped.” Id. at ––––, 131 S.Ct. at 2551. It noted that such an overlap is inevitable in an employment case based on a theory of a pattern or practice of discrimination. Id. at ––––, 131 S.Ct. at 2552.
The Supreme Court found the plaintiffs' evidence of a company-wide practice of discrimination at Wal–Mart to be inadequate. It acknowledged that “ ‘an employer's undisciplined system of subjective decisionmaking’ ” can be the predicate for a disparate impact claim under Title VII. Id. at ––––, 131 S.Ct. at 2554 (quoting Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 990, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988)). However, the Court found that the plaintiffs lacked proof to connect any Wal–Mart policy to specific instances of biased decisionmaking. In particular, the “only evidence of a ‘general policy of discrimination’ ” was the testimony of the plaintiffs' sociological expert; yet he could not estimate, in even the most general way, what percentage of the employment decisions at Wal–Mart were determined by the “stereotyped thinking” that he identified as the source of discrimination. Id. at ––––, 131 S.Ct. at 2553.
Goldman Sachs relies on a number of cases that, consistent with Dukes, deny class certification where the plaintiffs were unable to demonstrate that a common policy or practice affected the class as a whole. See In re Countrywide Financial Mortgage Lending Practices Litigation, No. 08–MD–1974, 2011 WL 4862174, at *2–4 (W.D.Ky. Oct. 13, 2011) (rejecting statistical evidence of racial disparities in mortgage financing where plaintiffs challenged only discretionary pricing and not underlying company-wide policy); In re Wells Fargo Residential Mortgage Lending Discrimination Litigation, No. 08–MD–1930, 2011 WL 3903117, at *3 (N.D.Cal. Sept. 6, 2011) (same); cf. Rodriguez v. National City Bank, 277 F.R.D. 148, 154–55 (E.D.Pa.2011) (same; rejecting class settlement). In each of those cases, however, the court made its determination on the basis of a complete record.
In two other cases cited by Goldman Sachs, the court denied class-related discovery, but in both instances the circumstances were unique. In Bell v. Lockheed Martin Corp., Civ. No. 08–6292, 2011 WL 6256978 (D.N.J. Dec. 14, 2011), the court rejected additional discovery relating to the plaintiffs' theories of employment discrimination, which it found were “substantially the same as the proofs rejected in Dukes.” Id. at *8. Yet, the plaintiffs had already conceded that discovery was “substantially complete.” Id. Similarly, in Windisch v. Hometown Health Plan, Inc., No. 3:08–cv–664, 2012 WL 115670 (D.Nev. Jan. 13, 2012), the court rejected the plaintiffs' demand for statistical data to support class claims of health care providers suing an insurer where the plaintiffs had disclaimed the need for that very information, id. at *2, and where they had failed to seek to reopen discovery to obtain it, id. at *10.
Other cases subsequent to Dukes emphasize the importance of adjudicating a class motion only after class-related discovery is complete, discovery that often overlaps substantially with the merits. In In re Federal *300 Home Loan Mortgage Corp. (Freddie Mac) Securities Litigation, 281 F.R.D. 174, 176 (S.D.N.Y.2012), the court identified the ultimate goal: “[a] trial court must receive enough evidence to be satisfied that each Rule 23 requirement has been met.” Thus, because of the “rigorous analysis” required by Dukes, courts are reluctant to bifurcate class-related discovery from discovery on the merits. See, e.g., In re Community Bank of Northern Virginia Mortgage Lending Practices Litigation, Civ. A. Nos. 02–1201, 03–425, 05–688, 05–1386, MDL No. 1674, 2011 WL 4382942, at *3 (W.D.Pa. Sep. 20, 2011) (denying bifurcation in light of Dukes ). In Burton v. District of Columbia, 277 F.R.D. 224 (D.D.C.2011), while denying an initial motion for certification of a class of employees challenging allegedly discriminatory discipline and promotion practices because the plaintiffs had failed to meet the commonality requirement, id. at 228–30, the court nevertheless permitted additional discovery to enable them to satisfy their burden, id. at 230–31. The court reasoned that “[t]he Supreme Court's ruling in [Dukes ] confirms that pre-certification discovery should ordinarily be available where a plaintiff has alleged a potentially viable class claim because [Dukes ] emphasizes that the district court's class certification determination must rest on a ‘rigorous analysis' to ensure ‘[a]ctual, not presumed, conformance’ with Rule 23.” Id. (quoting Dukes, ––– U.S. at ––––, 131 S.Ct. at 2551). Similarly, in Johnson v. Flakeboard America Ltd., C/A No. 4:11–2607, 2012 WL 2237004 (D.S.C. March 26, 2012), the court denied the defendant's motion to dismiss class claims in an employment discrimination case on the basis of Dukes, finding that the plaintiffs were entitled to take discovery to support commonality where the gravamen of the complaint was that the defendant's system of promotion was infected by “excessive subjectivity.” Id. at *6. Cf. Feske v. MHC Thousand Trails Ltd. Partnership, No. 11–CV–4124, 2012 WL 1123587, at *2 (N.D.Cal. April 3, 2012) (holding that disclosure of members of putative class “even more appropriate in the wake of Dukes ” ).
To be sure, in this case Goldman Sachs does not argue that the plaintiffs should be foreclosed from pre-certification discovery altogether. Rather, it contends that they should be allowed to take depositions and document discovery directed toward general policies but denied, for the time being, employee data that would provide the basis for a statistical analysis. According to Goldman Sachs, if, after the initial phase, the plaintiffs are unable to show a company-wide policy with a discriminatory impact, then, under Dukes, a class motion would be futile, and there would be no need to disclose detailed data.
But the plaintiffs have already identified specific employment practices that they allege contribute to discrimination in evaluation, compensation, and promotion, in particular “the ′360–degree review' process, the forced quartile ranking of employees, and the ‘tap on the shoulder’ system for selecting employees for promotion.” Chen–Oster, 877 F.Supp.2d at 118, 2012 WL 2912741, at *2 (internal quotation marks omitted). It is highly unlikely that class certification could be finally resolved on the basis of depositions and document discovery alone. Employment policies do not operate in a vacuum. A seemingly neutral policy may have a discriminatory impact. See Watson, 487 U.S. at 986–87, 108 S.Ct. 2777. That is the case even when the employment process includes some element of subjectivity. Id. at 990–91, 108 S.Ct. 2777. In order to determine whether there is such an impact and to ascertain whether it is attributable to a common policy or practice, it is necessary to acquire data about the class and analyze it. That analysis may show no discriminatory impact; it may show bias but fail to attribute it to any particular policies; it may show that there is evidence of discrimination in some segments of the defendant organization but not others. But without such an analysis, the link that Dukes requires between an employment policy and discriminatory impact in order to support a finding of commonality can neither be proved nor disproved.
Dukes does not suggest that statistics are irrelevant to the issue of commonality. Rather, it holds that statistics about multiple employment decisions, standing alone, will not satisfy the commonality requirement. Rather there must be some “glue,” some *301 common practice that binds the decisions together. Dukes, ––– U.S. at ––––, 131 S.Ct. at 2552. The plaintiffs here have identified such practices. They are therefore entitled to obtain individualized personnel data without further delay.
Goldman Sachs contends that even if discovery of the databases is not deferred, it should be limited because the information requested is not reasonably accessible as defined in Rule 26(b)(2)(B). (Def. Memo. at 11–12). That rule provides:
A party need not provide discovery of electronically stored information from sources that the party identifies as not reasonably accessible because of undue burden or cost. On motion to compel discovery or for a protective order, the party from whom discovery is sought must show that the information is not reasonably accessible because of undue burden or cost. If that showing is made, the court may nonetheless order discovery from such sources if the requesting party shows good cause, considering the limitations of Rule 26(b)(2)(C). The court may specify conditions for the discovery.
Fed.R.Civ.P. 26(b)(2)(B); see Capitol Records, Inc. v. MP3tunes, LLC, 261 F.R.D. 44, 51 (S.D.N.Y.2009) (describing shifting burden under Rule 26(b)(2)(B)).
Rule 26(b)(2)(B) takes a categorical approach: it invites the classification of electronically stored information (“ESI”) as either “accessible” or “not reasonably accessible.” While cost and burden are critical elements in determining accessibility, a showing of undue burden is not sufficient by itself to trigger a finding of inaccessibility. For example, the sheer volume of data may make its production expensive, but that alone does not bring it within the scope of Rule 26(b)(2)(B). Rather, the cost or burden must be associated with some technological feature that inhibits accessibility. In Zubulake v. UBS Warburg LLC, 217 F.R.D. 309, 318–19 (S.D.N.Y.2003), a case that predated the Rule, the Honorable Shira A. Scheindlin, U.S.D.J., held that “[w]hether electronic data is accessible or inaccessible turns largely on the media on which it is stored.” Id. at 318. She then identified five categories of sources of ESI in descending order of accessibility: (1) active, on-line data; (2) near-line data; (3) data that is archived or stored off-line; (4) backup media designed for disaster recovery rather than routine use; and (5) erased, fragmented, or damaged data. Id. at 318–19. Of these, Judge Scheindlin considered the first three categories to be generally accessible and the last two to be presumptively inaccessible. Id. at 319–20.
Although the Rules Advisory Committee did not incorporate such specific classifications in Rule 26(b)(2)(B), it retained the concept that cost and burden are related to the source of the ESI. Thus, the committee notes state that “some sources of electronically stored information can be accessed only with substantial burden and cost. In a particular case, these burdens and costs may make the information on such sources not reasonably accessible.” Fed.R.Civ.P. 26 advisory committee's note (2006 Amendment) (emphases added). The committee further observed*302 that “[i]t is not possible to define in a rule the different types of technological features that may affect the burdens of costs of accessing electronically stored information.” Id. Thus, decisions subsequent to the enactment of Rule 26(b)(2)(B) address accessibility by analyzing the interplay between any alleged technological impediment and the resulting cost and burden. For instance in W.E. Aubuchon Co. v. BeneFirst, LLC, 245 F.R.D. 38 (D.Mass.2007), the court found data to be inaccessible because, although it was stored on a server, the method of storage and lack of indexing rendered it extremely costly to search. Id. at 42–43; see also General Electric Co. v. Wilkins, No. 1:10–cv–674, 2012 WL 570048, at *5 (E.D.Cal. Feb. 21, 2012) (holding that accessibility generally turns on format in which ESI is stored); General Steel Domestic Sales, LLC v. Chumley, No. 10–cv–1398, 2011 WL 2415715, at *2 (D.Colo. June 15, 2011) (finding ESI inaccessible because of inability to search it except manually); Johnson v. Neiman, No. 4:09CV00689, 2010 WL 4065368, at *1 (E.D.Mo. Oct. 18, 2010) (holding that accessibility depends largely on nature of media); Helmert v. Butterball, LLC, No. 4:08CV00342, 2010 WL 2179180, at *1, *8 (E.D.Ark. May 27, 2010) (same); Capitol Records, 261 F.R.D. at 51 (same); Semsroth v. City of Wichita, 239 F.R.D. 630, 637 (D.Kan.2006) (same).
The Sedona Conference, too, recognizes the technological dimension of accessibility:
Reasonable accessible sources generally include, but are not limited to, files available on or from a computer user's desktop, or on a company's network, in the ordinary course of operation.
The converse is information that is “not reasonably accessible” because of undue burden or cost. Examples of such sources may include, according to the Advisory Committee, backup tapes that are intended for disaster recovery purposes and are not indexed, organized, or susceptible to electronic searching; legacy data that remains from obsolete systems and is unintelligible on the successor systems; and data that was “deleted” but remains in fragmented form, requiring a modern version of forensics to restore and review.
The Sedona Conference, The Sedona Principles, Second Edition: Best Practices Recommendations & Principles for Addressing Electronic Document Production, Comment 2.c. at 42 (2007 Annotated Version) (hereafter, “Sedona Principles”).
Here, Goldman Sachs has not demonstrated that the cost and burden associated with extracting ESI from any of the relevant databases is a function of the means of storage. The current version of PeopleSoft, for example, is accessed in the regular course of business by Goldman Sachs employees using the Query Studio tool. (Obradovich Aff., 8; Obradovich Dep. at 22). Where a more substantial project is undertaken, Goldman Sachs relies on Aon Hewitt to create programming code to select the desired fields of information. (Chandramouli Decl., 8; Deposition of Vishala Chandramouli dated June 21, 2012 (“Chandramouli Dep.”), attached as Exh. A to Declaration of Adam T. Klein dated June 27, 2012, at 47–49). The older PeopleSoft database is also searchable, but only with some newly created program, since it is not linked to Query Studio. (Obradovich Aff., 9; Obradovich Dep. at 72). Once employee identifiers are obtained through a search of PeopleSoft, they can then be used to extract corresponding data from the CRS system. (Obradovich Aff., 17). The FRS system and MDS selection database are likewise technically accessible. (Def. Memo. at 11). Accordingly, Rule 26(b)(2)(B) presents no barrier to discovery of the databases.
A party may nevertheless resist discovery of non-computerized documents or of ESI that is reasonably accessible on the ground that the discovery sought is disproportionate. The concept of proportionality is embodied in Rule 26(b)(2)(C). Tucker, 281 F.R.D. at 91. That rule provides:
On motion or on its own, the court must limit the frequency or extent of discovery otherwise allowed by these rules or by local rule if it determines that:
(i) the discovery sought is unreasonably cumulative or duplicative, or can be obtained from some other source that is more convenient, less burdensome, or less expensive;
(ii) the party seeking discovery has had ample opportunity to obtain the information by discovery in the action; or
(iii) the burden or expense of the proposed discovery outweighs its likely benefit, considering the needs of the case, the amount in controversy, the parties' resources, the importance of the issues at stake in the action, and the importance of discovery in resolving the issues.
Fed.R.Civ.P. 26(b)(2)(C). “The ‘metrics' set forth in Rule 26(b)(2)(C)(iii) provide courts significant flexibility and discretion to assess the circumstances of the case and limit discovery accordingly to ensure that the scope and duration of discovery is reasonably proportional to the value of the requested information, the needs of the case, and the parties' resources.” The Sedona Conference, The Sedona Conference Commentary on Proportionality in Electronic Discovery, 11 Sedona Conf. J. 289, 294 (2010); accord Tamburo v. Dworkin, No. 04 C 3317, 2010 WL 4867346, at *3 (N.D.Ill. Nov. 17, 2010).
Goldman Sachs personnel estimate that it would require between 90 and 150 hours to extract the information that the plaintiffs seek from the current PeopleSoft database using the Query Studio tool. (Obradovich Aff., 5). Performing a quality check of the resulting data would then take another 40 to 80 hours. (Obradovich Aff., 6).
Because Query Studio is unavailable for extracting data from the older PeopleSoft database, Goldman Sachs projects that the burden of searching there would be more substantial. An analyst would have to write computer code to create the necessary queries before pulling the information and then checking for accuracy. (Obradovich Aff., 9; Obradovich Dep. at 72–73). Goldman Sachs maintains that programming and extraction would take 160 to 240 hours and quality control another 40 to 80 hours, for a total of 200 to 320 hours. (Obradovich Aff., 9–11).
The CRS database presents a different challenge. Goldman Sachs believes that it would require 40 hours of work to extract the requested information from this database for employees who worked as Associates or Vice Presidents in the revenue-generating divisions between January 1, 2002 and December 31, 2011. (Obradovich Aff., 17). However, Goldman Sachs suggests that quality control would be more time-consuming “because the CRS data is not maintained in a user-friendly format” and would have to be manually reviewed, requiring at least an additional 240 hours of work. (Obradovich Aff., 18).
For the FRS database, Goldman Sachs estimates that it would take five to ten work days (presumably 40 to 80 hours) to write queries for the requested information, confirm the accuracy of the resulting computer code, extract the data, and perform a sample check to verify formatting and the like. (Pathak Aff., 4). An additional five days, or 40 hours, would be devoted to quality control, comparing the extracted data to available hard copy documents. (Pathak Aff., 5). These estimates relate only to the period from 2003 forward. (Pathak Aff., 2). There is data from 2002, but it relates solely to the Equities section of the Securities Division and my not be complete in any event. (Pathak Aff., 3). Extracting that data would require an addition four days or 32 *304 hours, but no quality control could be done because Goldman Sachs does not have the hard copy documents necessary to use as reference. (Pathak Aff., 8).
Finally, obtaining the requested information from the MD Selection Database would, according to Goldman Sachs, take 32 to 40 hours. (Pathak Aff., 9). Another 16 to 24 hours would be needed for quality control. (Pathak Aff., 9).
There are two potential strategies that could reduce the burden that Goldman Sachs complains of. First, a sample of the requested information might be obtained from each database and then utilized to draw inferences about the population as a whole. Neither party has endorsed this approach. The plaintiffs seem to fear that the use of sampling might result in an analysis that will not be accepted by the Court as accurately reflecting the extent of any discriminatory impact from Goldman Sachs' policies. They argue that “[t]he data is relevant in the aggregate to perform the applicable analyses to show patterns of statistically significant shortfalls or effects of challenged policies.” (Pl. Memo. at 7). In support of this contention, they cite two cases containing language that could be taken to discourage sampling. In Gutierrez v. Johnson & Johnson, Inc., No. 01 Civ. 5302, 2002 WL 34717245, at *5 (D.N.J. Aug. 13, 2002), the court rejected the defendant's effort to “artificially limit the data available to plaintiffs” because broader discovery would “present a more complete and reliable picture of the effects of [the defendant's] practices.” Similarly, in Smith v. Xerox Corp., 196 F.3d 358, 368–69 (2d Cir.1999), overruled on other grounds by Meacham v. Knolls Atomic Power Laboratory, 461 F.3d 134 (2d Cir.2006), rev'd, 554 U.S. 84, 128 S.Ct. 2395, 171 L.Ed.2d 283 (2008), the Second Circuit expressed concern that “[i]n any large population a subset can be chosen that will make it appear as though the complained of practice produced a disparate impact. Yet when the entire group is analyzed any observed differential may disappear.” For its part, Goldman Sachs contends that sampling requires a homogeneous population in order to form the basis for a valid analysis, and the population here is heterogeneous because it includes employees from different divisions with many different job titles and responsibilities. (Def. Memo. at 19–20).
I would not be so quick to abandon sampling, particularly since neither party has offered any expert evidence to the effect that sampling would invalidate any analysis in this case. The parties' objections seem to assume random sampling across the entire universe rather than use of stratified sampling. The latter technique would take into account the heterogeneity of the population by dividing it into subgroups that are each homogeneous with respect to the relevant variables, after which a random sample would be drawn from each subgroup. See Michigan Department of Education v. United States Department of Education, 875 F.2d 1196, 1205 (6th Cir.1989) (endorsing stratified sampling as reliable); Spears v. First American eAppraiseIT, No. C–08–868, 2012 WL 1438709, at *6 (N.D.Cal. April 25, 2012) (finding that criticism of expert analysis might be cured with further stratification of sample); Feske, 2012 WL 1123587, at *2 (suggesting parties use stratified sampling “to assure a statistically significant representation of the population” of the putative class); Schafer v. State Farm Fire & Casualty Co., Civil Action No. 06–8262, 2009 WL 799978, at *5 (E.D.La. March 25, 2009) (relying on stratified sampling); McReynolds v. Sodexho Marriott Services, Inc., 349 F.Supp.2d 1, 23 (D.D.C.2004) (relying on stratified sampling in Title VII case); Chavez v. IBP, Inc., No. CV–01–5093, 2004 WL 5520002, at *10–11 (E.D.Wash. Dec. 8, 2004) (rejecting expert analysis because of failure to utilize stratified sampling). Nevertheless, I lack the expertise to impose any particular sampling technique on the parties unilaterally. Moreover, there is some reason to believe that the savings to be realized from sampling here are not as great as they may be in other circumstances. To the extent that the burden on Goldman Sachs is a consequence of the time that would be spent on developing queries and writing programming code, that work would be largely necessary regardless of the size of the population subject to the search.
*305 The other alternative—and one that the plaintiffs advocate—would require Goldman Sachs to produce in digital form all of the information contained in each of the databases. Goldman Sachs acknowledges that, at least in the short run, such a “data dump” would impose less of a burden on it than a more targeted production. (Chandramouli Dep. at 48–49, 51–52).
There is no legal impediment to ordering production in that form. See High Point SARL v. Sprint Nextel Corp., No. 09–2269, 2011 WL 4526770, at *12 (D.Kan. Sept. 28, 2011) (requiring production of entire database over objection that irrelevant information would be included); Goshawk Dedicated Ltd. v. American Viatical Services, LLC, No. 1:05–CV–2343, 2007 WL 3492762, at *1 (N.D.Ga. Nov. 5, 2007) (requiring production of database over objections based on relevance and confidentiality); but see Daugherty v. Murphy, No. 1:06–cv–878, 2010 WL 4877720, at *7 n. 5 (S.D.Ind. Nov. 23, 2010) (rejecting production of entire database as infeasible); Nicholas J. Murlas Living Trust v. Mobil Oil Corp., No. 93 C 6956, 1995 WL 124186, at *5 (N.D.Ill. March 20, 1995) (denying production of entire database on grounds of relevance and burden). There is no suggestion here that the databases at issue contain information that is privileged or subject to the work product doctrine or data that might constitute a trade secret. To the extent that they include personal information of employees, any production could be made pursuant to a strict confidentiality order.
Nevertheless, any short term savings from disclosure of the entire databases are likely to be offset by the costs involved in converting the mass of unorganized data provided into useable information. As Goldman Sachs' witness described the problem, “it would be hard for anybody getting it to interpret it because they would need to have the business knowledge to link up all of the information that we provide.” (Chandramouli Dep. at 49). Thus, the time required to draft a report to impart the necessary knowledge to the plaintiffs' counsel and their experts could well approach the time otherwise saved. (Chandramouli Dep. at 49). Furthermore, because the plaintiffs would be receiving something more akin to “raw” data, there would be a greater likelihood of future disputes regarding whether the parties' respective experts were utilizing equivalent, appropriately validated information.
While Goldman Sachs remains free to respond to the plaintiffs' pending discovery demands by producing all of the information from the databases if, on reflection, it determines that this approach is less burdensome, I will not order it to do so. Although a court may certainly consider the existence of a less costly alternative in evaluating a producing party's claim of burden, it has limited authority and expertise to dictate the means by which a party complies with its production obligations.
Ultimately, then, I must determine whether the plaintiffs' discovery demands are appropriate in light of the criteria set forth in Rule 26(b)(2)(C). The plaintiffs have not previously had the opportunity to obtain the information contained in the databases, and that information is not available from other sources. Thus, Rule 26(b)(2)(C), subsections i and ii do not favor limiting discovery. Rather, it must be ascertained whether the burden of the discovery outweighs its likely benefit in light of the factors set forth in subsection iii.
There is little doubt that the needs of this case justify the discovery sought by the plaintiffs. The information in the databases is central to the plaintiffs' claims of gender discrimination in compensation, promotion, and evaluation. The amount in controversy, while not specifically quantified, is surely substantial. The plaintiffs represent that Goldman Sachs employs approximately 32,500 persons worldwide and 18,900 in North and South America. (First Amended Class Action Complaint (“FACAC”), 61). While it is unknown how many are female Associates, Vice Presidents, and Managing Directors in the revenue-producing divisions (FACAC, 61), the class would certainly number at least in the hundreds. Each class member would have a claim for back pay, some for a substantial period of time. Thus, the financial stakes here are high. At the same time, Goldman Sachs has ample resources to respond in discovery. Indeed, at *306 its direction, Aon Hewitt is regularly performing special projects on the PeopleSoft database similar to the search requested by the plaintiffs, some of which require more than 200 hours of employee time. (Chandramouli Dep. at 42). Of course, the importance of this litigation is not measured in dollars alone; the plaintiffs seek to vindicate the civil rights of the class members, and thus further an important public interest. As the Rules Advisory Committee observed, “[Rule 26(b)(2)(C)(iii) ] recognizes that many cases in public policy spheres, such as employment practices, free speech, and other matters, may have importance far beyond the monetary amount involved.” Fed.R.Civ.P. 26 advisory committee's note (1983 Amendment); see also John L. Carroll, “Proportionality in Discovery: A Cautionary Tale,” 32 Campbell L.Rev. 455. 464 (2010) (“A more fundamental problem with proportionality needs to be discussed: the danger that monetary value of a case, alone, will control the proportionality analysis, impeding the discovery efforts of parties with limited resources and failing to acknowledge the non-pecuniary importance of public policy-related suits, such as those involving allegations of discrimination.”).
Against these considerations, I must weigh the burden imposed on Goldman Sachs with respect to each database, taking into account both monetary and non-monetary components.
Costs cannot be calculated solely in terms of the expense of computer technicians to retrieve the data but must factor in other litigation costs, including the interruption and disruption of routine business practices and the costs of reviewing the information. Moreover, burdens on information technology personnel and the resources required to review documents for relevance, privilege, confidentiality, and privacy should be considered in any calculus of whether to allow discovery, and, if so, under what terms. In addition, the nonmonetary costs (such as the invasion of privacy rights, risks to business and legal confidences, and risks to privileges) should be considered.
Sedona Principles at 38. Goldman Sachs exaggerates the burden associated with extracting data from the current PeopleSoft database, and the restrictions in scope that it proposes are unjustified. First, Goldman Sachs based its estimate of the time necessary to respond to the plaintiffs' requests on its use of Query Studio, a tool designed for individualized database searches. (Obradovich Dep. at 61). When Goldman Sachs requires comprehensive searches, as would be the case here, it normally delegates responsibility to Aon Hewitt to develop the appropriate programming code, presumably because this methodology is more efficient. Furthermore, a substantial portion of Goldman Sachs' time estimate—between 40 and 80 hours—is allocated to quality assurance. (Obradovich Aff., 6). This estimate, which is rather conclusory, appears to be based on a goal of providing a pristine set of data. However, the standard for the production of ESI is not perfection. Rather, “[a] responding party must use reasonable measures to validate ESI collected from database systems to ensure completeness and accuracy of the data acquisition.” The Sedona Conference, The Sedona Conference Database Principles: Addressing the Preservation and Production of Databases and Database Information in Civil Litigation, March 2011 Public Comment Version, at 32 (emphasis added). Thus, it would be sufficient, and far less burdensome, to sample the data extracted to determine if there are systematic errors requiring further attention rather than to implement the comprehensive quality review apparently contemplated by Goldman Sachs.
At the same time, the limitations suggested by Goldman Sachs are artificial. It argues for restrictions by date and employment unit based on the employment history of each individual plaintiff. (Def. Memo. at 10). However, the plaintiffs are litigating more *307 than their individual claims; they seek to represent a class that covers all divisions in which the challenged evaluation policies are utilized and to assert claims for the entire class period. Moreover, Goldman Sachs has not shown that the limitations it proposes would reduce its burden. The plaintiffs are therefore entitled to the full range of information they seek from the current PeopleSoft database dating back to September 2004. Any disputes concerning which specific data fields are responsive to the plaintiffs' requests will be addressed as they arise.
Searching the older PeopleSoft database, however, would likely entail a significant incremental burden. It could not be searched with Query Studio, and any computer code written to search the current database would not be directly transferable, since the two databases do not have identical fields of information. (Obradovich Aff., 8). And, while information for the period prior to September 2004, which is available only on the older database, may ultimately be necessary for evaluating the merits and determining appropriate relief, it is not apparent why it would be critical to issues of class certification. In all likelihood, if the plaintiffs can meet the requirement for commonality for the period after September 2004, it can be inferred that they could meet it with respect to the earlier period as well. Thus, the burden of extracting the requested information from the older PeopleSoft database at this time outweighs the benefit. Nevertheless, the plaintiffs fear that Goldman Sachs will dispute the representativeness of data obtained for less than the full class period. (Transcript of Proceedings dated Aug. 8, 2012 (“Tr.”) at 22–23). This objection is readily resolved: Goldman Sachs shall be relieved at this juncture of the obligation to produce information from the older PeopleSoft database only if it stipulates that it will not challenge the representativeness of data obtained from the current database for purposes of class certification.
As noted above, the CRS database contains information about compensation recommendations. Again, Goldman Sachs' estimate of the time required to respond to the plaintiffs' requests is overblown, and its proposed alternative is inadequate. Although it projects that it would need only 40 hours to extract the requested information, Goldman Sachs complains that it would take six times that long for quality assurance. (Obradovich Aff., 17–18). Yet it has provided no detail to support this number, and nor has it done sampling or conducted a pilot search. Its offer to provide TCR reports for the full period during which they have been generated is insufficient. These documents show only the ultimate compensation determinations. (Obradovich Dep. at 116–17). In order to attempt to ascertain whether any gender disparities in compensation are the result of Goldman Sachs' policies on one hand or individual supervisory decisions on the other on, it is critical to identify not only the final pay or bonus determination, but also the intermediate recommendations reflecting those factors (and others). (Tr. at 60–61). Providing only the year-end information in the TCR reports would make it more difficult to distinguish among such causal variables, and Goldman Sachs shall therefore provide the CRS database information requested from January 1, 2002. (Obradovich Aff., 17).
Goldman Sachs has not suggested that it would be burdensome to extract the requested information from the FRS database. Rather, it seeks to limit its production to the Securities and Investment Management Divisions. (Def. Memo. at 11). As discussed above, any such limitation is unwarranted, since it would impair the plaintiffs' ability to establish the prerequisites for certification of the entire class that they seek to represent. The data shall therefore be produced for employees in Investment Banking and Merchant Banking Divisions as well. The information shall be provided for the period after January 1, 2003, when the database was created. *308 (Pathak Aff., 2).
Similarly, Goldman Sachs has agreed to produce the requested information from the MD Selection Database “consistent with its other data proposals.” (Def. Memo. at 10). Again, in order to give the plaintiffs a fair opportunity to demonstrate their entitlement to certification of a class, Goldman Sachs shall extract the requested information for all of the revenue generating divisions from January 1, 2002 forward.
There is no doubt that the plaintiffs are entitled to documents, whether in electronic or hard copy form, concerning Goldman Sachs' compensation, promotion, and evaluation policies going back to 2000, two years prior to the beginning of the class period. The only dispute is the degree of effort that Goldman Sachs must expend in locating such documents, a question that cannot be answered in the abstract. Goldman Sachs argues that because of changes in personnel and organizational structure over time, it will be more burdensome to locate documents created or maintained during the earlier portion of the period at issue. (Tr. at 51–52). That is no doubt true. But Goldman Sachs' obligation is to make reasonable efforts to locate and produce information responsive to the plaintiffs' legitimate discovery demands. Furthermore, how reasonable those efforts are will depend in part on the importance of the documents; a cursory search may be all that is required with respect to marginally relevant documents, while a far more diligent search may be necessary where core documents are at stake. Until Goldman Sachs has conducted its search and the parties have identified with some specificity the alleged shortcomings of that effort and the types of documents at issue, there is no basis for a ruling.
For the reasons set forth above, the plaintiffs' motion to compel (Docket no. 145) is granted to the extent that Goldman Sachs shall provide the requested information for employees in the revenue-generating divisions from the current PeopleSoft database, the CRS database, the FRS database, and the MD Selection database, for the time periods identified above. Provided that Goldman Sachs agrees not to argue for purposes of class certification that information in the older PeopleSoft database is materially different from that in the current database, it need not now produce information from the earlier database. With respect to the non-database documents, the plaintiffs' motion is denied without prejudice to renewal when it can be demonstrated that Goldman Sachs has failed to produce specific types of relevant documents.
The parties also refer to this unit as the Investment Management Division. (Defendants' Opposition to Plaintiffs' Motion to Compel Discovery of Data and Documents (“Def. Memo.”) at 4).
This is not to say that discovery related to class certification should never be phased. Where it is likely that targeted discovery on a particular issue may be dispositive of class certification, it is entirely proper to take that discovery first. For example, if there is doubt about numerosity, discovery limited to that issue could result in an early determination of the viability of class claims. Phased discovery, however, is less likely to be efficient with respect to the issue of commonality, especially after Dukes.
At the risk of imprecision, I will use the term “inaccessible” interchangeably with “not reasonably accessible.”
A few cases appear to suggest otherwise. In Thermal Design, Inc. v. Guardian Building Products, Inc., No. 08–C–828, 2011 WL 1527025, at *1 (E.D.Wis. April 20, 2011), for example, the court found archived e-mail and shared network drives not to be reasonably accessible based exclusively on the time and expense of searching and indexing them. To the extent that Thermal Design and similar cases do not consider some systemic barrier necessary to a finding of inaccessibility, I respectfully disagree with them.
Even if the information on the databases were not reasonably accessible, I would find that the plaintiffs have nevertheless generally demonstrated good cause to obtain it to the extent that the requested discovery is not disproportionate, which is the issue discussed in the following section. Courts that have analyzed good cause under Rule 26(b)(2)(B) have generally considered the same types of factors relevant to a proportionality determination under Rule 26(b)(2)(C). See, e.g., Tucker v. American International Group, Inc., 281 F.R.D. 85, 99 (D.Conn.2012); Brocade Communications Systems, Inc. v. A10 Networks, Inc., No. 10–CV–3428, 2012 WL 70428, at *1–3 (N.D.Cal. Jan. 9, 2012) (finding good cause for forensic imaging of otherwise inaccessible data where it was relevant to misappropriation claim and efforts to obtain it elsewhere had failed); General Steel Domestic Sales, 2011 WL 2415715, at *3 (finding no good cause where importance of evidence did not justify extreme burden and where information likely available elsewhere); Helmert, 2010 WL 2179180, at *8 (finding no good cause where likelihood of obtaining relevant information insubstantial); Johnson, 2010 WL 4065368, at *l–2; Major Tours, Inc. v. Colorel, Civ. No. 05–3091, 2009 WL 3446761, at *2–4 (D.N.J. Oct. 20, 2009).
The Aon Hewitt representative states that a project valued at $5,000 represents approximately 75 to 100 hours of work, “depend[ing] on the complexity and the people involved.” (Chandramouli Dep. at 38–39). Even assuming that the discovery projects here are complex enough to warrant an hourly rate two or three times higher, the total cost, in terms of dollars, would be modest relative to the scale of the litigation.
Goldman Sachs argues that it is not possible to determine what factors influence the recommendations at different stages. (Tr. at 65–66). What inferences can be drawn from the data is yet to be determined, but the plaintiffs cannot be precluded altogether from obtaining the relevant evidence.
The information maintained by TAG for the year 2002 need not be produced because it is incomplete and potentially unreliable. (Pathak Aff., 3).
End of Document.