CNN Sues Perplexity: The End of the ‘Facts Aren’t Copyrightable’ Era?

CNN has filed a federal copyright and trademark lawsuit against Perplexity AI, alleging the systematic theft of over 17,000 news stories. This case, filed in the U.S. District Court for the Southern District of New York, represents a critical inflection point for AI search engines that rely on real-time web scraping to provide value to users.

The Core Allegations

According to the lawsuit, Perplexity’s web crawlers did more than just index information; they reproduced copyrighted stories, photos, and videos without authorization Variety. CNN specifically points to Perplexity’s “Comet Plus” premium bundle, alleging that the AI company falsely packaged CNN reporting as part of its own paid service, effectively misrepresenting a business partnership that does not exist.

Perhaps most damaging for Perplexity is the allegation that their systems actively bypassed paywalls to access and summarize restricted content. This moves the needle from “fair use” indexing toward what CNN calls “large-scale free riding” that starves original reporting of direct traffic and subscription revenue NPR.

The Defense: “You Can’t Copyright Facts”

Perplexity’s Chief Communications Officer, Jesse Dwyer, has leaned into a classic tech defense: facts are public domain. However, legal experts like Prof. Michael Goodyear argue that while facts are free, the expression and structural execution of journalism are protected CNET. The lawsuit claims Perplexity is plagiarizing the expensive journalistic execution of human reporters rather than just extracting data points.

Competitive Landscape and Production Trade-offs

For developers and product builders, this lawsuit highlights the growing risk of building on top of “unlicensed” search APIs. While Perplexity’s Sonar models are highly cost-effective at approximately $2 per 1M input tokens, they now carry significant legal and reputational baggage Perplexity Docs.

In contrast, competitors are taking different paths:

  • OpenAI: Moving toward formal licensing deals and aligning with the EU AI Act to ensure long-term stability, though at a higher cost of $2.50 per 1M input tokens for GPT-4o Tech Times.
  • Anthropic: Claude 3.5 Sonnet remains a leader in specialized coding and reasoning but faces its own hurdles with EU market access Anthropic.
  • Google: Leveraging its existing massive licensing ecosystem with Gemini to provide high-volume efficiency with 1M+ token context windows.

Community Sentiment

The reaction among practitioners is a mix of irony and pragmatism. On Reddit, many have pointed out the poetic justice of news organizations—who have historically aggregated content from smaller bloggers—now being the ones crying foul Reddit. However, the consensus among operators is that the “wild west” of scraping is ending. The industry is shifting toward a “pay-to-play” model where data licensing is a prerequisite for enterprise-grade AI Law360.

Takeaways for Builders

  • Audit your data sources: If your RAG pipeline or agent relies on scraping paywalled content, you are now in the crosshairs of “secondary infringement” liability.
  • Licensing is the new Moat: Established players like OpenAI and Google are using their legal budgets and licensing deals as a barrier to entry for smaller startups.
  • Cost vs. Compliance: The $0.50/1M token savings of smaller providers may not be worth the risk of a service disruption if their scrapers are legally blocked.
  • Watch the ‘Comet Plus’ fallout: If Perplexity is forced to unbundle premium news, the value proposition of AI search engines for general consumers could shift overnight.

Leave a Comment

Your email address will not be published. Required fields are marked *