Three authors have filed a lawsuit against AI startup Anthropic, claiming it trained its language model, Claude, on their copyrighted works without permission.A complaint filed in a California court by Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson claimed that Anthropic "pirated" their written material for AI systems. According to the authors, Anthropic downloaded their books from pirated versions off illegal websites to train these AI systems.The lawsuit accuses Anthropic of building "a multibillion-dollar business on the backs of hundreds of thousands of copyrighted books." The company ignored copyright protections and took part in "large scale theft of copyrighted works" to train its Claude models, according to the lawsuit.
Anthropic has so far not said very much regarding the allegations, save to say it is "aware" of the legal action. The case joins similar lawsuits against other AI companies like Microsoft and OpenAI over using copyrighted material to develop large language models. It puts an official stamp on growing tensions between content creators and AI firms over intellectual property rights.The complaint claimed that Anthropic trained Claude on a dataset called 'The Pile' that included a collection of pirated e-books referred to as 'Books3,' amounting to nearly 200,000 books downloaded without authorization.
The authors argue that Anthropic knew it was training on copyrighted works without consent. According to them, the company "made a deliberate decision to cut corners and rely on stolen materials to train their models" instead of obtaining proper licences.The lawsuit claims that authors have been harmed by Anthropic's actions through the loss of money from both book sales and licensing revenues. Its AI models are now competing against writers who created the human-written content, thus threatening their livelihood.This shall help put into context the Anthropic Claude models: that it places them as competitors to OpenAI's ChatGPT and other such AI chatbots. The company raised billions in funding, valued at more than $18 billion.Some critics say that AI companies must pay licensing fees to the authors and publishers whose works they use to train their models. A few companies, including Google, have begun licensing agreements with news organizations and other providers of content.
Developers of AI, however, maintain that training machines via the use of copyrighted material falls under the 'fair use' provisions in copyright law. They further argue that the models don't reproduce exact copies of training texts.It has raised a variety of complicated legal and ethical questions on how copyright applies in AI development. Courts may need to decide whether AI training is copyright infringement or transformative fair use.For authors, this lawsuit represents an attempt at getting control over how their works are being used in training artificial intelligence. They continue to argue that businesses making money from AI need to pay creators whose material enabled that technology.How this would pan out—if ever ruled on—is such a case that might set a very important precedent, forcing firms to obtain licenses for all copyrighted material used in the training process, adding cost and complexity to AI development.
The complaint seeks statutory damages for alleged wilful copyright infringement and an injunction to prevent Anthropic from further using the authors' works without permission.The more that AI is capable of doing, the more intense the intellectual property debates are likely to be. Creators think that their works should be appropriately protected and compensated for, while AI companies push to have access to broad datasets that better their models.If cases such as this one against Anthropic end up setting a legal and regulatory scene through which AI development could be steered, then they may then determine the ways in which firms collect training data and if far-reaching licensing will become the norm.For now, the lawsuit adds to a growing pile of legal actions against big AI companies for their use of copyrighted material. The courts' decisions as they battle through the issues could have sweeping implications for the future of AI and content creators.