Here comes another major lawsuit over generative AI and copyright, with OpenAI as the target. This class action comes courtesy of the Authors Guild, with a starry list of names among the plaintiffs: George R.R. Martin, Jodi Picoult, Jonathan Franzen, John Grisham…these are the big guns.
As with previous suits in this vein, the core issue is Big AI allegedly training its systems on pirated, copyrighted material. The filing accuses OpenAI of evading “the Copyright Act altogether to publish their lucrative commercial endeavor, taking whatever datasets of relatively recent books they could get their hands on without authorization.”
I particularly enjoyed this bit, which riffs off OpenAI CEO Sam Altman’s recent testimonies on Capitol Hill: “Altman has told Congress that he shares Plaintiffs’ concerns. According to Altman, ‘ensuring that the creator economy continues to be vibrant is an important priority for OpenAI…OpenAI does not want to replace creators. We want our systems to be used to empower creativity, and to support and augment the essential humanity of artists and creators.’ Altman testified that OpenAI ‘think[s] that creators deserve control over how their creations are used’ and that ‘content creators, content owners, need to benefit from this technology.’”
There’s control, and there’s control. For the controller version, here’s Franzen on the authors’ demands: “Authors should have the right to decide when their works are used to ‘train’ AI. If they choose to opt in, they should be appropriately compensated.”
As I wrote last month regarding Google’s submission to an Australian consultation on AI regulation, the tech firms don’t want opt-in systems—they want creators to have to opt out of having their works used as large language model (LLM) training fodder.
This is problematic, and not just because it puts the onus on the creator to protect their works from unwanted exploitation. For an example of why, let’s have a look at the “artist and creative content owner opt out” form that OpenAI just published in tandem with the release of its DALL-E 3 image generator (which features higher-quality images, more plausible hands, and ChatGPT-aided prompting—sorry, “prompt engineers”—and which will be integrated into Microsoft’s Bing Chat).
Apart from telling artists they can tweak their websites to disallow OpenAI’s GPTBot web crawler, the form promises that those sending it in will have their images removed “from future training datasets.” As for the models that are already out there, that ship has sailed—you can’t make an LLM unlearn something, even if you insist, as OpenAI does in its form, that the models “no longer have access to the data” after training.
The Authors Guild notes that OpenAI’s GPT is “already being used to generate books that mimic human authors’ work, such as the recent attempt to generate volumes 6 and 7 of plaintiff George R.R. Martin’s Game of Thrones series A Song of Ice and Fire, as well as the numerous AI-generated books that have been posted on Amazon that attempt to pass themselves off as human-generated and seek to profit off a human author’s hard-earned reputation.”
The ability to tell OpenAI to stay away in the future won’t change that situation. And incidentally, the impossibility of retrospectively altering an existing LLM’s past training dataset could become a big problem for AI companies in areas other than copyright.
TechCrunch reports that the Polish privacy authority has decided to launch an investigation into OpenAI, following a General Data Protection Regulation complaint from privacy and security researcher Lukasz Olejnik. The EU law gives people the right to demand that companies fix incorrect personal information about them. Olejnik found inaccuracies in a ChatGPT-generated biography of himself, which may be par for the course when it comes to genAI, but in the EU Olejnik has the right to ask for rectification—and when he did so, OpenAI told him this was impossible.
It’s all about control. And if OpenAI can’t let people exercise the control afforded to them by the law, it’s in trouble. More news below.
Want to send thoughts or suggestions to Data Sheet? Drop a line here.