OpenAI is firing back at The New York Times after the company was sued for copyright infringement over the use of the publisher’s articles to train its artificial intelligence chatbot.
In a blog post, the Sam Altman-led firm said that the Times is “not telling the full story” and claimed it “intentionally manipulated” prompts to make it appears as if ChatGPT generates near word-for-word excerpts of articles.
“Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” the post states.
OpenAI maintained that such verbatim regurgitation is a “rare bug.” There are guardrails in place to limit “inadvertent memorization,” added the company, which stressed that users are barred under its terms of use from prompting models to produce answers that may violate intellectual property rights.
The blog was issued in response to the Times suing last month over novel copyright issues raised by generative AI in a suit that could have far-reaching implications on the news publishing industry. The publisher presented extensive evidence of products from OpenAI and Microsoft displaying near word-for-word excerpts of articles when prompted, which allegedly go far beyond the snippets of texts typically shown with ordinary search results. One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit shows 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times.
In the post, OpenAI argued that training AI models using the publisher’s articles and other “publicly available internet materials” is fair use, which allows for the use of copyrighted works to make a secondary creation as long as it’s transformative.
“That being said, legal right is less important to us than being good citizens,” the company adds. “We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.
The Times reached out to OpenAI in April to explore a deal that’d resolve concerns around the use of its articles as training material, according to the complaint. The media organization, after the highly publicized releases of ChatGPT and BingChat, put the company and Microsoft on notice that their tech infringed on copyrighted works. The terms of a resolution involved a licensing agreement and the institution of guardrails around generative artificial intelligence tools.
OpenAI said that negotiations focused on a partnership around “real-time display with attribution in ChatGPT.” The talks faltered, however, as the company maintained that the publisher’s content “didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training.”
In response to allegations that ChatGPT generates near verbatim excerpts of entire articles, OpenAI countered that the answers the Times induced “appear to be from years-old articles that have proliferated on multiple third-party websites.” It explained, “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.”
As content from major newspapers and magazine companies are being ingested by AI companies, they’re increasingly facing a choice of whether to accept licensing deals and fuel potential competitors that may possible replace them or fight with a lawsuit. Axel Springer, the owner of Politico and Business Insider and German newspaper Bild, took the money, while the Times became the first major media company to sue.
A finding of infringement could result in massive damages since the statutory maximum for each willful violation runs $150,000. It could also result in a court order requiring OpenAI to terminate its AI model if it was trained on copyrighted material.
Ian Crosby, a lawyer for the Times, said in a statement that the “blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT.” He added that the company “seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure.”
Newer articles
<p> </p> <div data-testid="westminster"> <div data-testid="card-text-wrapper"> <p data-testid="card-description">The foreign secretary's remarks come as the government...