Minerva26

Tremblay v. OpenAI, Inc.

2025 WL 635335 (N.D. Cal. 2025)

February 27, 2025

Illman, Robert M., United States Magistrate Judge

Download PDF

To Cite List

Summary

The court denied the plaintiffs' request to have input in determining search terms for ESI, citing concerns about delays and disputes over methodology. The court also ordered the parties to meet and confer regarding OpenAI's request for discovery from the plaintiffs' agents and for the plaintiffs to either produce the requested materials or certify that they have searched for them and found none. Additionally, the court ordered OpenAI to seek relevant documents from the plaintiffs' publishers through third-party discovery practice but denied plaintiffs' request to order the parties to include file names on their privilege logs.

Additional Decisions

PAUL TREMBLAY, et al.,
Plaintiffs,
v.
OPENAI, INC., et al.,
Defendants

Case No. 23-cv-03223-AMO (RMI)

United States District Court, N.D. California

Filed February 27, 2025

Illman, Robert M., United States Magistrate Judge

ORDER RE: DISCOVERY DISPUTE LETTER BRIEFS Re: Dkt. Nos. 298, 337, 340, 352

Now pending before the court are a number of discovery dispute letter briefs (dkts. 298, 337, 340, 352). Pursuant to Federal Rule of Civil Procedure 78(b) and Civil Local Rule 7-1(b), the court finds that these matters are suitable for disposition without oral argument. As to the disputes presented, the court rules as follows.

Plaintiffs’ Proposed Additional Search Terms (dkt. 298):

The court previously denied Plaintiffs’ suggestion that the requesting Party have input in determining the search terms “because it raised the specter of too many future delays and disputes over methodology and search term formulation.” See Order of July 31, 2024 (dkt. 166) at 3. Several months later, while conceding that OpenAI had not excluded documents containing the term “torrent” (and that OpenAI had indeed produced some documents responsive to an RFP pertaining to the use of torrents), Plaintiffs nevertheless alluded to gaps in the production and once again sought to inject themselves into the ESI search-term formulation process. See Ltr. Br. (dkt. 242) at 3. Finding that Plaintiffs had failed to show why the production was insufficient, the court denied Plaintiffs’ suggestion that they should have input into the formulation of search terms. See Order of January 13, 2025 (dkt. 247) at 2-3.

It appears that Plaintiffs now seek to litigate that issue for a third time, and by doing so, it appears that the objective of the court’s initial ruling (i.e., to prevent delays and the stalling of the discovery process stemming from endless discovery disputes over methodology and search-term formulation) has been nevertheless frustrated. While OpenAI submits that Plaintiffs have identified no deficiencies in its productions (see Ltr. Br. (dkt. 298) at 5) – Plaintiffs essentially respond to the effect that they “by definition know little about documents OpenAI hasn’t produced.” Id. at 2. Plaintiffs note that their review of “OpenAI’s productions and search terms, identified areas where the terms and connectors appeared too narrow or didn’t capture lingo used by OpenAI’s own employees, and proposed additional search terms.” Id. Plaintiffs add that their proposed eight search “strings” are not burdensome because they would yield only 345,000 documents, which “[w]ith technology assistance, e-discovery vendor DISCO estimates a 345,000- document review takes 2-3 weeks, [from] upload through production.” Id. at 4.

OpenAI responds by noting that Plaintiffs’ request now not only seeks the addition of “numerous torrent-related search terms, but also [] hundreds of more terms packed into eight search strings, including nonsensically overbroad terms like (ChatGPT AND “be doing”), (memori* AND data), and (seed*) . . . [which] [b]ased on review time to date, Plaintiffs’ terms would add over 9,000 hours of attorney review time with no showing that OpenAI’s productions are deficient.” Id. at 4. As to proportionality, OpenAI submits that “Plaintiffs purport to propose only 8 search ‘strings,’ but those search strings consist of 362 search terms, hitting on an additional 345,000 documents beyond the over 640,000 documents that OpenAI has already agreed to review.” Id. at 6. OpenAI submits that requiting it to introduce hundreds of thousands of non-responsive documents into its review queue will only prolong fact discovery while failing to add to the substance of Plaintiffs’ case. Id. at 6. The court agrees with OpenAI’s assessment here. Without repeating the details of the Parties’ disputes as to the numerous search terms included in Plaintiffs’ proposed search strings – of which, string numbers 1 and 4 were set forth by OpenAI as exemplars (see id. at 5-6) – the court cannot avoid the conclusion that Plaintiffs are trying to relitigate an issue they have already lost (i.e., that they should be involved in the search term formulation process without making a clear showing of prejudice stemming from gaps or deficiencies in the search terms disclosed by the producing Party). As a result, Plaintiffs’ request for an order directing OpenAI to deploy its proposed search strings is DENIED.

OpenAI’s Request for Discovery from Plaintiffs’ Agents, Assistants, etc. (dkt. 337):

OpenAI submits that the twelve Plaintiffs in this action have not adequately searched for or produced “documents from the files of those who work under their direction and on their behalf— their literary agents, ghostwriters, loan-out companies, assistants, and production companies (hereafter, ‘Agents’).” See Ltr. Br. (dkt. 337) at 2. As OpenAI sees it, two disputes exist in this regard: “First, although Plaintiffs do not dispute that the Agents’ files are within their ‘control,’ the parties disagree as to whether Plaintiffs have conducted a reasonable search of those files. Second, the parties disagree as to whether Plaintiffs’ publishers’ files are within Plaintiffs’ ‘control.’” Id.

OpenAI seeks certain information from Plaintiffs’ literary and other agents that negotiated publishing and licensing agreements for their works – OpenAI submits that “[d]ocuments relating to such negotiations are relevant to (1) ownership of the works, (2) the purpose of the works, and (3) the market for the works and any alleged harm thereto. Yet, for example, Snyder’s and Lippman’s productions do not contain any responsive documents regarding the negotiations for their domestic publishing agreements. And Coates has not produced any documents from his literary agent, his film and TV agent, or his manager . . .” Id. OpenAI also seeks more discovery from the loan-out companies that are the alleged owners of some of the asserted works, from a number of Plaintiffs’ assistants, and from production companies that Plaintiffs control. Id. at 2-3. OpenAI then concludes by stating that, “[g]iven the April 28 fact discovery cut-off, the Court should compel all Plaintiffs to conduct a reasonable inquiry for responsive documents from the Agents’ files and to produce them by February 28.” Plaintiffs respond to the effect that they “have already conducted diligent searches for potentially responsive documents from a variety of data sources and agents, as required by the Federal Rules and the Parties’ ESI Protocol [and] [w]hat OpenAI seeks now are documents from individuals who do not possess responsive documents, [or] from individuals from which Plaintiffs have no legal right to obtain documents, or both.” Id. at 4.

The court finds that this dispute is representative of the Parties talking past one another – that it is the manifestation of poor meet-and-confer efforts – and, that the “dispute” evades judicial resolution as it has been presented. Thus, the Parties are ORDERED to meet-and-confer forthwith regarding OpenAI’s assertions regarding the facet of their dispute regarding documents from all of Plaintiffs’ Agents (not including the publishers, which will be discussed below). Following which, seeing as no genuine dispute about relevance of proportionality has been presented (see id. at 3-6), Plaintiffs are ORDERED to either produce the requested materials, or to certify to Defendants that they have searched for the documents or information in question and that no such documents or information were found that have not already been (or will be) produced. Thereafter, unless OpenAI can articulate a reason why the court should take further action in this regard (i.e., by making a showing that Plaintiffs’ certifications are demonstrably untrue), the matter will be concluded and OpenAI will have to accept the productions and certifications it has received.

As to documents and information that OpenAI seeks from Plaintiffs’ publishers, OpenAI submits that “Plaintiffs do not dispute the existence of relevant discovery in their publishers’ possession. Their only excuse for not producing it is that they do not control their publishers. But each Plaintiff’s publishing agreement contains provisions entitling them to obtain from the publisher highly relevant and responsive information.” Id. at 4. Unfortunately, it once again appears that the Parties are practically speaking different languages and that they are arguing past one another because Plaintiffs respond to the effect that they “do not have a right to receive the documents requested by OpenAI from their publishers [and that] OpenAI misconstrues and oversimplifies the extent of Plaintiffs’ rights under the publishing agreements.” Id. at 6. Plaintiffs add that they “have already produced the documents they are entitled to under their agreements— they have produced sales and royalty statements that Plaintiffs have received from their publishers. But OpenAI demands more, for example, market analyses of the works, but those documents are simply not within the scope of documents that Plaintiffs have a right to obtain, as Plaintiffs have explained to OpenAI. OpenAI has the same rights to the other documents it seeks from Plaintiffs’ publishers as Plaintiffs do through Rule 45 subpoenas.” Id.

This dispute too seems to be the product of the Parties seemingly taking turns to simply conjure up something about which to fight. The court expects counsel to resolve this sort of dispute without the need for court intervention. Thus, rather than laying such a dispute at the court’s doorstep – if OpenAI believes that Plaintiffs’ publishers are in possession of relevant information which Plaintiffs’ assert is beyond the scope of their possessory rights under the relevant agreement(s), then OpenAI should – as Plaintiffs have suggested – seek to obtain those documents via third-party discovery practice through a Rule 45 subpoena. Accordingly, as to this facet of the Parties’ dispute, OpenAI’s request to compel relevant material from Plaintiffs’ publishers that goes beyond the scope of what each of those agreements allows Plaintiffs to obtain is DENIED. As to any relevant material that falls within the scope of what Plaintiffs may obtain from their publishers under the relevant agreements – to the extent not already produced, OpenAI’s request to compel the production of that material is GRANTED. If any dispute shall arise as to relevance or proportionality, the Parties are ORDERED to meet and confer such as to resolve or substantially narrow that dispute so that it can be articulated in a clear and cohesive manner that would lend itself to a ruling which would not require the expenditure of substantial effort in attempting to square the Parties’ arguments with one another.

Plaintiffs’ Request to Identify File Names of Withheld Documents in Privilege Logs (dkt. 340):

Plaintiffs submit that file names should be included on the Parties’ privilege logs because file names are the type of information that will enable Plaintiffs to assess a claim of privilege without revealing the privileged information itself. Seed Ltr. Br. (dkt. 340) at 2. OpenAI claims that including the file names, at this juncture (while narrating Plaintiffs’ reported changes of position as to this issue) is unduly burdensome because it “will result in another round of privilege review and redactions.” Id. at 4-5. OpenAI adds that “[n]ow, for the first time, following months of negotiations over the contents of the parties’ logs, Plaintiffs contend that OpenAI’s logged subject matter descriptions are somehow ‘uninformative’ . . . [a]s such, requiring OpenAI to undertake an additional burdensome privilege review and redaction process for several hundred or more file names is entirely needless. And contrary to Plaintiffs’ assertions, file names may be the subject of privilege review and redactions, as they will undoubtedly be here.” Id. at 5. Plaintiffs counter to the effect that they take issue with OpenAI’s claim that Plaintiffs have wasted the parties’ time by constantly changing positions, because that “is a misrepresentation of the email correspondence the parties exchanged and met and conferred on during the past month which was done in good faith to come to an agreement on the scope and cut-off date for logging privileged documents [and] Defendants fail to point out [that] they changed their position several times as well which is ordinary in the course of negotiations.” Id. at 4.

The court finds this sort of back-and-forth to constitute yet another manifestation of the Parties’ drumming up a discovery fight about a subject that should have been worked out by the Parties themselves without the need for court intervention. The dispute itself is insubstantial – the essence of which boils down to little more than petty finger-pointing about whose time was wasted by whom. Given Plaintiffs’ statement to the effect that “[t]he need for including filenames is especially necessary in light of OpenAI’s inadequate privilege logging practices to date,” it is unclear why Plaintiffs waited until February 21, 2025, to first raise this matter. Nor have Plaintiffs presented a specific and persuasive case that they have been prejudiced by the omission of the files names they now claim should have been part of OpenAI’s privilege logs all along. See id. at 2-4. Instead, even putting aside the sheer volume of seemingly endless discovery disputes in this case, the timing and nature of this dispute (in conjunction with the other disputes that are the subject of this order) gives rise to the impression that the Parties may be using the discovery process, at least in part, to engage one another in a warfare of sorts, that is, by attempting to fashion certain discovery disputes in order to harass one another’s flanks as might be the case during a game of chess or on a medieval battlefield. Regardless of whether or not that impression holds true – at bottom, because the court finds that Plaintiffs could have, and indeed should have, presented any issue they may have had with the form of OpenAI’s privilege log at a much earlier juncture during the discovery process, Plaintiffs’ request for the court to intervene at this late stage in the discovery process regarding so ancillary a matter as a minute detail as to the form of OpenAI’s privilege log is DENIED.

OpenAI’s Requests for Discovery from the Kadrey v. Meta Platforms Case (dkt. 352):

OpenAI relates that it seeks discovery from Plaintiffs in a separate, but reportedly similar, action that Plaintiffs have filed against Meta Platforms, Inc., in Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal.) – through which OpenAI submits that Plaintiffs have asserted copyright infringement claims that are materially identical to the claims in this case. See Ltr. Br. (dkt. 352) at 2. According to OpenAI, in both cases, Plaintiffs have alleged Meta and OpenAI trained their large language models using Plaintiffs’ books which deprived Plaintiffs of payments for the use of their asserted works as training data. Id. Given what OpenAI considers to be a substantial overlap between the legal and factual issues in the two cases, it wants discovery from Plaintiffs’ involvement in the Kadrey case along the following lines: “(a) Plaintiffs’ responses to interrogatories and requests for admission (“RFAs”), Ex. A (RFPs 88/89/90); (b) expert witness materials disclosed by or on behalf of Plaintiffs, Ex. B (RFPs 86/87/88); and (c) deposition transcripts and exhibits for Plaintiffs’ expert witnesses, Ex. C (RFPs 87/88/89).” Id. OpenAI then states that while “Plaintiffs concede that cross-production is appropriate ‘to the extent that the facts overlap,’ [h]ere the overlap is extensive and includes: (1) whether Plaintiffs own the copyrighted works they assert in both cases (relevant to Plaintiffs’ infringement claim); (2) whether a relevant market for Plaintiffs’ works has been harmed by the conduct at issue (relevant to the defendants’ primary defense in both cases – fair use []); and (3) what caused the harm Plaintiffs claim to have suffered, which, for example, Plaintiff Hwang admitted is the ‘same harm’ in both cases.” Id. at 4. OpenAI repeatedly characterizes its requests as “specific” and “targeted.” See id. at 2-4.

Right off the bat, the undersigned struggles to comprehend the usefulness of OpenAI’s interest in seeking information from the Kadrey case as to whether or not Plaintiffs actually own the copyrighted works they have asserted in both cases. Unless there is a specific issue as to ownership (and none has been articulated in this letter brief by OpenAI) the undersigned sees no material advancement of OpenAI’s defense in this case by putting Plaintiffs to the burden of producing material from the Kadrey case that simply confirms what OpenAI already knows (or will know) to be true in this case – that is, that Plaintiffs are the owners of the works at issue in this case. The court would view the issue differently if, on the other hand, OpenAI had articulated a basis to conclude that ownership of the copyrighted works at the heart of this case was a contested issue here, and that discovery about that topic from Kadrey might be logically related and useful to the resolution of that issue or to OpenAI’s defense in general. However, no such showing having been made, the court sees no reason to put Plaintiffs to the burden of producing discovery from Kadrey related to their ownership of the works at issue in this case. If OpenAI can better articulate a basis for seeking that information, the Parties should meet and confer in an effort to resolve the matter informally, but if such efforts fail, OpenAI can re-present its request for a court order on the basis of a better-articulated foundation. In the meantime, as to OpenAI’s request to compel Plaintiffs to produce information from Kadrey related to their ownership of the works at issue in this case, that request is DENIED.

As to the other two broad categories of information that OpenAI seeks from the discovery process in Kadrey (whether a relevant market for Plaintiffs’ works has been harmed by Meta’s conduct, and what caused the harm that Plaintiffs claim to have suffered as a result of Meta’s conduct) – having reviewed the Parties’ statements and their arguments, the court is unable to see the usefulness of this discovery in the context of OpenAI’s defense in this case. For example, as to the interrogatory and request for admission responses OpenAI seeks from Kadrey, it asserts that the “discovery sought, inter alia, details on Plaintiffs’ claims of ownership of their asserted works, Plaintiffs’ positions on the markets for those works, and admissions regarding whether Plaintiffs have licensed those works to train LLMs . . .[and that] Plaintiffs’ positions on those issues – where the same works are asserted, the same alleged market and market harm is claimed, and the same damages are asserted – are directly relevant to Plaintiffs’ claims of copyright infringement and OpenAI’s fair use defense in this case.” Id. What the court struggles to understand is why OpenAI needs that information from Kadrey, why it would not be duplicative of the information is has solicited or can solicit from Plaintiffs in this case, and why OpenAI thinks putting Plaintiffs to the burden of producing this broad swath of information from Kadrey would be proportional to OpenAI’s needs in this case. In other words, simple relevance (if it even is at all relevant, which may very well not be the case) is not the only sine qua non of discovery, and the court finds that – without the showing of a more “specific” and “targeted” need that actually relates to this case – OpenAI’s request for the production of Plaintiffs’ responses to interrogatories and requests for admission in Kadrey must be DENIED as not proportional to the needs of this case.

OpenAI also seeks expect witness documents from Kadrey on a similar basis – that is, by claiming that they are “also directly relevant to critical aspects of Plaintiffs’ theories in this case [because] [f]or example, one key issue that will undoubtedly require expert testimony in both cases is whether Plaintiffs are able to distinguish the harm or apportion[ment] [of] damages allegedly caused by Meta versus OpenAI versus other LLM developers, as is evident from Plaintiffs’ Kadrey deposition transcripts already produced in this action.” Id. at 4. OpenAI adds that “Plaintiffs’ experts in both cases will almost certainly also opine on core issues common to both cases, such as (a) the traditional or potential markets for Plaintiffs’ works and any alleged harm to such markets; (b) whether a market to license books to train LLMs exists; and (c) Plaintiffs’ alleged damages, including an assessment of royalties generated by the asserted works common to both cases.” Id. If that is the case, then it appears that the compelled production of expert witness documents from Kadrey would either simply be duplicative of the expert witness documents to be generated in this case, or, that they would merely offer OpenAI differing sets of expert opinions by the experts in this case versus those in Kadrey – and that is assuming arguendo that OpenAI is correct about the supposedly overwhelming factual and legal overlap between the two cases, something which OpenAI has merely stated, in somewhat conclusory fashion, in the letter brief at bar. In either scenario, however, the usefulness of the Kadrey expert witness documents to OpenAI’s defense of this case appears to be minimal at best, and therefore the request for their production must similarly be DENIED as not proportional to the needs of this case. As Plaintiffs have pointed out, the court has already once denied Plaintiffs’ request for cloned discovery from a parallel case involving OpenAI in the Southern District of New York, and OpenAI’s request for discovery from the Kadrey case – based on the articulated justifications presented here by OpenAI – must be similarly denied as overly broad, marginally relevant (if at all), and not proportionate to the needs of the case.

IT IS SO ORDERED.