A Blog About Intellectual Property Litigation and the District of Delaware


Proof I have other hobbies
Andrew E. Russell, displayed with permission

I don't talk about it much on the blog, but my other hobby (beyond writing about litigation and the District of Delaware for funsies, photography, and having an absurd-by-today's-standards number of children) is writing and speaking about AI and the law. I've been speaking about AI issues on panels at conferences since 2018. Most recently, I moderated a Sedona Conference panel about Copyright and AI.

In the context of copyright and AI, the question of whether training an AI model on copyrighted content is fair use is basically life-or-death for a lot of current AI models. Big generative models like ChatGPT are (typically) trained on giant masses of data collected from books, the internet, computer code, and other media. They can do all kinds of crazy things and feel like magic.

But if training an AI model on copyrighted material is not fair use, the current state of the art fails. Many of these AI models (and the companies and researchers who created them) are going to have maybe-impossible-to-solve copyright infringement issues. And AI model development may migrate to other places in the world where developers can train their models without fear of copyright infringement.

A Significant AI Decision from the District of Delaware

That's why last year‘s Thomson Reuters decision by Judge Bibas of the Third Circuit, sitting by designation here in the District of Delaware, tends to come up in these discussions. That opinion involves a copyright infringement claim by a company that trained an AI model using Westlaw headnotes. The company's goal was to make a search engine that finds cases in response to user questions (this is traditional machine learning, not the new-fangled generative AI).

In it, Judge Bibas denied summary judgment, holding that using the headnotes to train an AI model could have been fair use, and that the issue had to go to the jury. That was significant. Beyond that, some of his reasoning looked pretty solid for AI developers.

He suggested use of data to train a model could be transformative "intermediate copying," rather than a market substitute for the original material:

If Ross’s characterization of its activities is accurate, it translated human language into something understandable by a computer as a step in the process of trying to develop a “wholly new,” albeit competing, product—a search tool that would produce highly relevant quotations from judicial opinions in response to natural language questions. This also means that Ross’s final product would not contain or output infringing material. Under Sega and Sony, this is transformative intermediate copying.
. . .
Ross’s use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw. If so, it is not a market substitute.

Thomson Reuters Enter. Ctr. GmbH v. Ross Intel. Inc., 694 F. Supp. 3d 467, 483, 486 (D. Del. 2023).

The Court suggested that it could be that the AI tool functionally copied the link to the underlying opinion, rather than the expression in the headnotes, and that meaning was not protected by copyright:

But the heart of each headnote is its original expression, not its link to the part of the opinion it summarizes. So if Ross’s AI works the way that it says, it is likely fair use because it produces only the opinion, not the original expression.

Id. at 485. He gave some real weight to the benefits to the public of AI:

Deciding whether the public’s interest is better served by protecting a creator or a copier is perilous, and an uncomfortable position for a court. Copyright tries to encourage creative expression by protecting both. Here, we run into a hotly debated question: Is it in the public benefit to allow AI to be trained with copyrighted material?
The value of any given AI is likely to be reflected in the traditional factors: How transformative is it? Can the public use it for free? Does it discourage other creators by swallowing up their markets? So an independent evaluation of the benefits of AI is unlikely to be useful yet, even though both the potential benefits and risks are huge. Suffice it to say, each side presents a plausible and powerful account of the public benefit that would result from ruling for it. So a jury must decide the fourth factor—and the ultimate conclusion on fair use.

Id. at 486-487. All told, this opinion looked pretty helpful for the idea that training AI could be fair use of the underlying data.

A Reversal

But, as the case approached trial, Judge Bibas seemed to have second thoughts, and ordered additional summary judgment briefing.

Yesterday, he issued a new opinion that changed his prior ruling. This time, he granted summary judgment that training an AI model on Westlaw headnotes to create a better legal search engine was not fair use.

As to factor one, the transformative nature of AI, he decided that the fact that the AI developer used the headnotes to develop a competing tool meant that it was not transformative, because "intermediate copying" only applies to copying code when necessary, not copying written words like Westlaw headnotes:

My prior opinion wrongly concluded that I had to send this factor to a jury. . . . I based that conclusion on Sony and Sega. Since then, I have realized that the intermediate-copying cases (1) are computer-programming copying cases; and (2) depend in part on the need to copy to reach the underlying ideas. Neither is true here. Because of that, this case fits more neatly into the newer framework advanced by Warhol. I thus look to the broad purpose and character of Ross’s use. Ross took the headnotes to make it easier to develop a competing legal research tool. So Ross’s use is not transformative.

Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc., C.A. No. 20-613, D.I. 770 at 19 (D. Del. Feb. 11, 2025).

The Court still found that factors two and three—the nature of the work and how much of the work was used—went to the AI company.

But as to factor four, the effect on the market for the original, the Court reversed itself, finding that the potential market for the headnotes as training data was enough:

I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data. . . . And I thought a jury ought to sort out “whether the public’s interest is better served by protecting a creator or a copier.” Id.
In hindsight, those concerns are unpersuasive. Even taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute. . . . And it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. Ross bears the burden of proof. It has not put forward enough facts to show that these markets do not exist and would not be affected.

The Court also found that the public benefit of the AI research tool could not outweigh the copying:

Nor does a possible benefit to the public save Ross. Yes, there is a public interest in accessing the law. But legal opinions are freely available, and “the public’s interest in the subject matter” alone is not enough. . . . The public has no right to Thomson Reuters’s parsing of the law. Copyrights encourage people to develop things that help society, like good legal-research tools. Their builders earn the right to be paid accordingly. . . . There is nothing that Thomson Reuters created that Ross could not have created for itself or hired [a third party] to create for it without infringing Thomson Reuters’s copyrights.

Ouch.

Some Unsolicited Thoughts

It's a shame that the fair use factors don't fit all that well into some of what copyright is used for today, and that it's tough to apply them. But I would argue that the Court had it right the first time.

On factor one, the use is absolutely transformative. The model is using the headnotes to derive the fact of what Westlaw found significant about the original opinions. Those facts—not their expression—are unprotected. The model is transformative because it is assimilating those facts and using them to create a tool that helps people find cases.

Some level of intermediate copying is necessary to ingest those facts into the model. It's the facts, not the expression of them, that the model is using to help direct users to the cases they are searching for.

On factor four, the only other factor that the Court found favored no fair use, one of the Court's key points is that "[t]he public has no right to Thomson Reuters’s parsing of the law." But it is not copyright infringement to state the fact that "the Westlaw editors found opinion x significant for y reason," or that the Westlaw editors found a specific portion of a particular opinion to be relevant to an issue. That's a fact, not the expression of a fact.

And the tool the AI developer was creating, does not even go so far as to spit out those facts about Westlaw. It appears that those facts cannot be retrieved at all by a person using the research tool. It's not copying Westlaw's collection of facts; it's using some of them to guide users to the opinions they seek.

Finally, relying on the market for training data is a catch-22. If training a model on copyrighted material is fair use, then there is no market for training data, because it's unprotected. If it's not fair use, then there is a market.

All of that said, the Court has certainly given this more thought than I have, and I'm don't know whether that parties presented these arguments. It may also just be that the fair use factors as they are written now don't support a finding of fair use when training an AI model, at least under existing precedent.

I expect that generative AI will have an even tougher road when it comes to fair use, given that it can potentially spit out the source material. We'll have to see how the other pending cases (of which there are many—including a few here in Delaware) pan out.

If you enjoyed this post, consider subscribing to receive free e-mail updates about new posts.

All

Similar Posts