A US judge has ruled that a tech company’s use of books to train its artificial intelligence system – without permission of the authors – did not breach copyright law.
A federal judge in San Francisco said Anthropic made “fair use” of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model (LLM).
Judge William Alsup compared the Anthropic model’s use of books to a “reader aspiring to be a writer” who uses works “not to race ahead and replicate or supplant them” but to “turn a hard corner and create something different”.
Alsup added, however, that Anthropic‘s copying and storage of more than 7m pirated books in a central library infringed the authors’ copyrights and was not fair use – although the company later bought “millions” of print books as well. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement.
“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages,” Alsup wrote.
US copyright law says that wilful copyright infringement can result in damages of up to $150,000 (£110,000) per work.
The copyright issue has pitted AI firms against publishers and the creative industries because generative AI models – the term for technology that underpins powerful tools such as the ChatGPT chatbot – have to be trained on a vast amount of publicly available data in order to generate their responses. Much of that data has included copyright-protected works.
An Anthropic spokesperson said the company was pleased the court recognised its AI training was transformative and “consistent with copyright’s purpose in enabling creativity and fostering scientific progress”.
John Strand, a copyright lawyer at the US law firm Wolf Greenfield, said the decision from a “well-respected” judge was “very significant”.
He added: “There are dozens of other cases involving similar questions of copyright infringement and fair use pending throughout the US, and Judge Alsup’s decision here will be something those other courts must consider in their own case.”
Due to the number of other AI copyright cases working their way through the legal system, Strand said: “The expectation is that at some point the primary question of whether training LLMs on copyrighted materials is fair use likely will be addressed by the US supreme court.”
The writers filed the proposed class action against Anthropic last year, arguing the company, which is backed by Amazon and Alphabet, used pirated versions of their books without permission or compensation to teach Claude to respond to human prompts.
The proposed class action is one of several lawsuits brought by authors, news outlets and other copyright owners against companies including OpenAI, Microsoft and Meta Platforms over their AI training.
The doctrine of fair use allows the use of copyrighted works without the copyright owner’s permission in some circumstances. Fair use is a key legal defence for the tech companies, and Alsup’s decision is the first to address it in the context of generative AI.
AI companies argue their systems make fair use of copyrighted material to create new, transformative content, and that being forced to pay copyright holders for their work could hamstring the nascent industry. Anthropic told the court that it made fair use of the books and that US copyright law “not only allows, but encourages” its AI training because it promotes human creativity.
The company said its system copied the books to “study plaintiffs’ writing, extract uncopyrightable information from it, and use what it learned to create revolutionary technology”.
Giles Parsons, a partner at UK law firm Browne Jacobson, said the ruling would have no impact in the UK, where the fair use argument holds less sway. Under current UK copyright law, which the government is seeking to change, copyright-protected work can be used without permission for scientific or academic research.
He said: “The UK has a much narrower fair use defence which is very unlikely to apply in these circumstances.”
Copyright owners in the US and UK say AI companies are unlawfully copying their work to generate competing content that threatens their livelihoods. A UK government proposal to change copyright law in the UK by allowing use of copyright-protected work without permission – unless the work’s owner signals they want to opt out of the process – has beenmet with vociferous opposition from the creative industries.
Alsup said Anthropic violated the authors’ rights by saving pirated copies of their books as part of a “central library of all the books in the world” that would not necessarily be used for AI training. Anthropic and other prominent AI companies including OpenAI andFacebook owner Metahave been accused of downloading pirated digital copies of millions of books to train their systems.