Class Action Claims Microsoft Copilot Reproduced Copyrighted Journalism From Major Publishers

Yes, nine regional news publishers have filed a class action lawsuit against Microsoft and OpenAI, alleging that Microsoft Copilot reproduced hundreds of thousands of copyrighted articles without permission or compensation. The Virginian-Pilot, Los Angeles Daily News, Boston Herald, and six other regional publishers claim the companies engaged in “wholesale intellectual property theft on a scale never before seen in American publishing.” They’re seeking more than $10 billion in damages, arguing that Copilot’s training on their content and subsequent use in generating responses—without proper attribution or payment—constitutes willful copyright infringement that has harmed their advertising revenue and subscription models.

What Are the Publishers Claiming Copilot Did to Their Copyrighted Content?
How Did the Copyright Infringement Occur—And Why Is It Different From Traditional News Aggregation?
Which Publishers Are Named in the Lawsuit and Why Are They Suing?
What Are They Seeking—And How Much Could This Cost Microsoft and OpenAI?
What Has Microsoft Done in Response—And Is the Publisher Content Marketplace Enough?
What Does This Case Mean for Other Publishers, Content Creators, and the AI Industry?
What’s the Current Status—And What Happens Next in the Litigation?

What Are the Publishers Claiming Copilot Did to Their Copyrighted Content?

The complaint alleges that OpenAI and Microsoft engaged in systematic copyright infringement by “copying, scraping, and ingesting” hundreds of thousands of copyrighted news articles without permission to train ChatGPT and Microsoft Copilot. According to the lawsuit, the companies did not license these articles, seek permission from the publishers, or provide compensation—despite the fact that these articles represent years of professional journalism, fact-checking, and reporting resources. Beyond just ingesting the content, the publishers allege that when Copilot generates responses containing information from their journalism, the system removes critical attribution information such as journalists’ names and publication titles.

This stripping of copyright management information means readers of Copilot outputs cannot trace the content back to the original reporting—and crucially, users have no reason to visit the publishers’ websites. For regional news organizations that depend on direct website traffic for advertising revenue and subscription income, this deprivation of traffic represents direct financial harm. For example, if a user asks Copilot a question about a local government policy that The Virginian-Pilot originally reported, Copilot may provide that information without any indication it came from The Virginian-Pilot, redirecting the traffic and engagement the publisher would have received.

What Are the Publishers Claiming Copilot Did to Their Copyrighted Content?

How Did the Copyright Infringement Occur—And Why Is It Different From Traditional News Aggregation?

The core allegation is that millions of copyrighted articles were scraped and used to train large language models without any licensing framework or consent from the rights holders. Unlike traditional news aggregation (which typically links back to original sources and drives traffic), Copilot ingests the content directly into its training data, incorporates the information into its model weights, and then generates responses that compete directly with the original published articles. The publishers argue this is different from search engine indexing or aggregation because Copilot is not merely indexing or summarizing—it is reproducing information derived from copyrighted work as original output in a competing product.

When you ask google News about a topic, the system links you to multiple publisher sites. When you ask Copilot, the system synthesizes and presents information without necessarily directing you to any source. However, if the case outcomes are limited to intentional, willful infringement scenarios, publishers that intentionally allowed search crawlers may face arguments that they implicitly consented to automated access. This is a key legal distinction that may affect which publishers qualify for damages in the class action.

Which Publishers Are Named in the Lawsuit and Why Are They Suing?

Nine regional publishers are named as plaintiffs in the consolidated case: The Virginian-Pilot, Los Angeles Daily News, Boston Herald, and six others. These are not tiny local outlets but established regional news organizations with significant reporting resources and readerships. In addition to the consolidated case, US News & World Report separately sued OpenAI, alleging “OpenAI’s hijacking of content” from its websites.

The publishers’ motivation is straightforward: they invest substantial resources in reporting, fact-checking, and editorial oversight to produce original journalism. When AI companies train on that work without permission or payment, the publishers lose the competitive advantage of their exclusive reporting and face reduced traffic from users who get answers from Copilot instead of visiting their websites. For regional publishers especially—which often operate on tighter margins than national outlets—the loss of ad impressions and subscription conversions from deflected traffic is material. The lawsuit represents an attempt to recover damages and establish a legal precedent that AI companies cannot simply take published work without licensing or compensation.

Which Publishers Are Named in the Lawsuit and Why Are They Suing?

What Are They Seeking—And How Much Could This Cost Microsoft and OpenAI?

The plaintiffs are seeking more than $10 billion in damages, claiming the scale of infringement and the harm to their business models justifies significant financial recovery. This figure reflects not just the copyright infringement itself but the cumulative effect of lost traffic, reduced advertising revenue, and diminished subscription revenue resulting from Copilot providing answers directly rather than directing users to the publishers’ websites. The legal arguments center on copyright law’s provisions for infringement damages, including potential claims for willful infringement (which can result in enhanced damages) and unjust enrichment (the idea that Microsoft and OpenAI profited from the use of the publishers’ content without paying for it).

If the case succeeds, it could establish that large-scale AI training on copyrighted journalism without a licensing framework constitutes infringement and that companies must either license content or face significant liability. However, the defendants will likely argue fair use principles allow training data use, and they may point to the transformative nature of AI models. The outcome of this consolidated case could reshape how AI companies approach content licensing and training data acquisition.

What Has Microsoft Done in Response—And Is the Publisher Content Marketplace Enough?

Recognizing the copyright concerns, Microsoft announced a Publisher Content Marketplace designed to compensate publishers when their content is used by AI products, with Copilot as the first AI buyer. The marketplace launched with partnership agreements from major publishers including the Associated Press, USA Today, and People Inc. Under this model, publishers can opt into licensing agreements, and Copilot compensates them when their content is used. This response is a significant shift in Microsoft’s approach, but it does not necessarily resolve the core lawsuit.

For one, the marketplace is opt-in, meaning publishers have to proactively license their content through the new system—it does not retroactively address the alleged unauthorized ingestion of millions of historical articles already used in training. Second, only some major publishers have signed on; many regional publishers are not yet part of the marketplace, leaving their content in potential limbo. Third, the lawsuit still alleges that Microsoft and OpenAI engaged in willful infringement before the marketplace existed, and the existence of a marketplace going forward does not eliminate liability for past conduct. Publishers and legal experts are skeptical that the marketplace alone will satisfy the concerns raised in the lawsuit or eliminate Microsoft’s potential liability for historical copyright claims.

What Has Microsoft Done in Response—And Is the Publisher Content Marketplace Enough?

What Does This Case Mean for Other Publishers, Content Creators, and the AI Industry?

If the consolidated case succeeds, it will send a powerful signal that major AI companies cannot unilaterally decide to ingest copyrighted content without licensing. For publishers and content creators broadly, a win in this litigation would establish legal precedent that training data use is not a free-for-all and that copyright holders have enforceable rights to control how their work is used by AI systems. For the AI industry, the case represents a critical inflection point.

Companies like OpenAI and Microsoft have built their products on the assumption that training on publicly available internet content falls within fair use or is otherwise legally permissible. A judgment against them could require a wholesale rethinking of AI training practices, potentially requiring companies to license content upfront or create opt-out systems rather than opt-in marketplaces. Other publishers and creators not yet parties to the lawsuit are watching closely; many are considering their own legal actions or seeking better licensing agreements before their content is used in future AI models. This case is likely to influence how the entire tech industry approaches content licensing and intellectual property going forward.

What’s the Current Status—And What Happens Next in the Litigation?

In April 2025, the U.S. Judicial Panel on Multidistrict Litigation ordered consolidation of multiple copyright infringement cases against Microsoft and OpenAI in Manhattan federal court. This consolidation means that the regional publishers’ case, the US News & World Report case, and potentially other similar claims will be handled together, allowing the court to address core questions about whether large-scale data ingestion for AI training constitutes copyright infringement under U.S.

Law. The consolidated case is now in the federal court system, where procedural steps will likely include motions to dismiss (where defendants will argue legal immunity or fair use), discovery (where plaintiffs will seek evidence of the companies’ training practices), and potentially summary judgment motions before any trial. Given the complexity of AI law and the precedent-setting nature of the case, it could take years to resolve. However, the consolidation in Manhattan federal court signals that judges and the legal system are taking these claims seriously, and the outcome could reshape copyright law’s application to AI.