OpenAI is currently challenging a data preservation order issued in connection with the copyright dispute initiated by The New York Times (NYT), raising significant concerns about user privacy and the future of AI research. The NYT filed a lawsuit against OpenAI and its partner Microsoft in 2023, alleging that the AI models, including ChatGPT, were trained using millions of the newspaper's articles without permission, constituting copyright infringement. The NYT argues that OpenAI has unfairly profited from its intellectual property by building "substitutive products" that undermine the newspaper's business model.
A central point of contention is the court's order that OpenAI must preserve all ChatGPT output data. The NYT argues that this data is crucial to demonstrate how OpenAI's models copy, misattribute, paraphrase, or directly quote their work. They contend that access to specific ChatGPT outputs is necessary to prove their claims of copyright infringement and that OpenAI has a duty to preserve relevant data, referencing OpenAI's privacy policy which mentions data retention for legal obligations. News outlets fighting the lawsuit claim that OpenAI is destroying output logs.
OpenAI, however, is resisting the order, asserting that it represents an overreach and infringes upon the privacy of its users. According to OpenAI's COO, Brad Lightcap, the request is "sweeping and unnecessary" and fundamentally conflicts with the company's commitment to user privacy. OpenAI gives users tools to control their data, including options to opt-out and permanently remove deleted ChatGPT chats and API content from their systems within 30 days. The company argues that indefinite data retention abandons long-standing privacy norms and weakens privacy protections. OpenAI CEO Sam Altman stated that the company believes the request sets a bad precedent and will fight any demand that compromises user privacy.
The implications of this legal battle extend beyond the immediate parties involved. The case is being closely watched as it could set precedents for how AI companies handle user data in future legal proceedings. OpenAI argues that complying with the preservation order would impose a significant technical and logistical burden, requiring the company to store and manage massive datasets and may compromise their ability to comply with GDPR. They also state that keeping chats that may contain evidence of NYT articles being directly copied risks exposing numerous customer chats that are irrelevant to the copyright case. OpenAI has assured customers that the preserved data is only accessed by a small, audited legal and security team in a secure system.
Conversely, the NYT emphasizes the need for data to support their claims and to protect their intellectual property rights. They maintain that they must have access to specific ChatGPT outputs to prove how their articles were appropriated and repurposed by the AI, thereby solidifying their case against OpenAI and Microsoft. The newspaper contends that OpenAI's unauthorized use of its content threatens the economic sustainability of journalistic enterprises.
The magistrate judge at the US district court in the Southern District of New York initially sided with The Times, holding that OpenAI needs to preserve and segregate all output log data that would otherwise be deleted, including data from free, Plus, Pro, and Team versions of ChatGPT, as well as its API. This decision led OpenAI to appeal, arguing that the "preserve everything" request raises issues around user preference, privacy laws, and regulations in different regions of the world.
The outcome of this legal showdown could have far-reaching consequences for the AI industry and copyright law. It highlights the delicate balance AI companies must navigate between innovation, legal compliance, and ethical considerations surrounding user data privacy.