Reddit vs. Anthropic: Lawsuit over AI Training Data Scraping of User Comments for Claude Chatbot.

Jun 09, 2025
185 views
2 min read

Reddit has filed a lawsuit against Anthropic, accusing the AI company of illegally scraping user comments to train its Claude chatbot. The lawsuit, filed in California Superior Court in San Francisco on June 4, 2025, alleges that Anthropic violated Reddit's terms of service by using automated bots to access and extract massive volumes of user-generated content without permission or a licensing agreement.

Reddit claims that Anthropic has been quietly harvesting posts and conversations for years, even after publicly stating it had stopped crawling the site in July 2024. According to Reddit, logs show Anthropic's bots accessed the site over 100,000 times in the months that followed. Reddit argues this scraping constitutes a breach of contract, as the platform's user agreement explicitly prohibits commercial use of content without a licensing deal.

The lawsuit highlights a 2021 research paper co-authored by Anthropic CEO Dario Amodei, which identified Reddit as a rich source of training data for language models. Specific subreddits, including those focused on gardening, history, relationship advice, and even shower thoughts, were cited as containing high-quality AI training data.

Unlike companies like Google and OpenAI, which have licensing agreements with Reddit that include provisions for content deletion at the user's request, Reddit alleges Anthropic has no such arrangement. This raises privacy concerns, as deleted Reddit posts might still exist within Claude's training data. Reddit also accuses Anthropic of violating user privacy by collecting and using personal posts, including deleted content for commercial purposes, while bypassing protections like the site's robots.txt file, intended to prevent automated scraping.

Reddit is seeking financial damages and a court order to prevent Anthropic from using Reddit content in future models. They also want to block Anthropic from selling or licensing anything built with the scraped data, potentially requiring Claude to be taken off the market entirely. Reddit's chief legal officer, Ben Lee, stated that "AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data."

Anthropic has responded, stating that they disagree with Reddit's claims and will defend themselves vigorously. Anthropic argued in a 2023 letter to the U.S. Copyright Office that their training methods qualify as a "quintessentially lawful use of materials" because they involve copying information to perform a statistical analysis of a large body of data.

This isn't the first legal challenge Anthropic has faced regarding its training data. They are currently battling a lawsuit from major music publishers alleging Claude regurgitates copyrighted song lyrics. However, the Reddit lawsuit differs because it doesn't allege copyright infringement. Instead, it focuses on breach of terms of use and unfair competition.

The lawsuit raises fundamental questions about the use of publicly available online content for AI training and could set a major precedent for data rights. Reddit argues that "publicly available" doesn't mean "free to scrape and profit from." The case exemplifies a growing legal strategy among platform operators seeking to control their data assets through contractual and tort-based approaches, not just intellectual property rights.

The outcome of this case could significantly impact the AI industry, potentially requiring AI developers to secure explicit licensing agreements for training data, even if the data is publicly accessible. It also highlights the importance of user privacy and the need for AI companies to respect user preferences regarding content deletion.

Post

Written By

Rajeev Iyer

Rajeev Iyer is a seasoned tech news writer with a passion for exploring the intersection of technology and society. He's highly respected in tech journalism for his unique ability to analyze complex issues with remarkable nuance and clarity. Rajeev consistently provides readers with deep, insightful perspectives, making intricate topics understandable and highlighting their broader societal implications.

You may also like ...

Latest Post