Many generative AI technology vendors argue that they can train AI models on copyrighted material from the internet under fair use, even without permission from the rightsholders. However, some vendors, like OpenAI, are being cautious, possibly due to the pending lawsuits in this regard.
OpenAI has announced today that it has reached an agreement with Axel Springer, the owner of publications such as Business Insider and Politico, based in Berlin, to train its generative AI models using the publisher’s content and incorporate recent articles from Axel Springer into OpenAI’s viral AI-powered chatbot, ChatGPT.
This marks OpenAI’s second collaboration with a news organization after the company stated that it would license some of The Associated Press’ archives for model training.
In the future, ChatGPT users will receive summaries of selected articles from Axel Springer’s publications, including stories that are normally behind a paywall. These snippets will include attribution and links to the full articles.
As part of the deal, Axel Springer will receive payments of an unspecified amount and frequency from OpenAI. The agreement is valid for several years, and although it does not bind either party to exclusivity, Axel Springer states that it will support the outlet’s existing AI-driven ventures that are based on OpenAI’s technology.
CEO Mathias Döpfner of Axel Springer expressed excitement about shaping the global partnership between Axel Springer and OpenAI, calling it the first of its kind. He conveyed that they aim to explore the opportunities of AI-empowered journalism to elevate the quality, societal relevance, and business model of journalism to the next level.
In addition to publishers leveraging generative AI for questionable content strategies, publishers and generative AI vendors have a strained relationship, with the former alleging copyright infringement and growing concerns about generative models affecting website traffic. For example, Google’s new generative AI-powered search experience, called SGE, has pushed traditional search result links further down search pages, potentially reducing traffic to those links by as much as 40%.
Publishers also oppose vendors training their models on content without compensation agreements in place, especially given reports that tech giants like Google are experimenting with AI tools to summarize news. According to a recent survey, hundreds of news organizations are now using code to prevent OpenAI, Google, and others from scanning their websites for training data.
In August, several media organizations including Getty Images, The Associated Press, the National Press Photographers Association, and The Authors Guild published an open letter calling for more transparency and copyright protection in AI. In the letter, the signatories urged policymakers to consider regulations that require transparency into training datasets and allow media companies to negotiate with AI model operators, among other suggestions.
The letter highlighted the practices undermining the media industry’s core business models, which are based on readership, licensing, and advertising. It stated that these practices not only violate copyright law but also significantly reduce media diversity and undermine the financial viability of companies to invest in media coverage, further impeding the public’s access to high-quality and trustworthy information.