OpenAI strikes a deal with Reddit to train its AI on real-time user posts: Is user privacy at risk?

Less than three months after Reddit struck a $60 million AI content licensing deal with Google, the world’s largest social aggregation site has just inked another partnership with another AI giant, OpenAI. The financial terms of the deal were not disclosed.

In an announcement yesterday, OpenAI said it has signed a deal with Reddit to gain access to real-time content from Reddit’s data API. With this integration, OpenAI will be able to surface discussions from the Reddit website within ChatGPT and other new products. This partnership will rant OpenAI access to Reddit’s extensive collection of posts, comments, and discussions, creating a rich resource for training its AI models.

“We’re partnering with Reddit to bring its content to ChatGPT and new products,” OpenAI said in a post on X.


As part of the partnership agreement, OpenAI will gain a unique and diverse dataset, characterized by an array of writing styles and communication patterns, from lighthearted banter to in-depth technical debates. This variety is crucial for developing AI models capable of engaging users in a more natural and nuanced manner.

For Reddit, this partnership could lead to the introduction of new AI-powered features aimed at enhancing user experience. Moreover, the data licensing agreement with OpenAI represents a fresh revenue stream for the platform.

“Reddit has become one of the internet’s largest open archives of authentic, relevant, and always up-to-date human conversations about anything and everything. Including it in ChatGPT upholds our belief in a connected internet, helps people find more of what they’re looking for, and helps new audiences find community on Reddit,” Reddit CEO Steve Huffman said in a statement shared on OpenAI blog.

However, the deal is not without controversy. Despite its benefits, the collaboration raises some concerns, particularly regarding user privacy. Many Reddit users might be uneasy with the prospect of their posts being utilized to train AI models, especially if it involves the risk of exposing personal information.

Another critical issue is the potential for bias. Reddit’s content mirrors the opinions and biases of its user community. If these biases are not meticulously addressed, they could be embedded within the AI models, resulting in discriminatory or skewed outputs.

Additionally, Redditors have been outspoken about the platform’s management decisions, and it remains to be seen how they will respond to this latest announcement. In June 2023, over 7,000 subreddits went dark as users protested against changes to Reddit’s API pricing. Recently, following the announcement of a partnership between OpenAI and the programming forum Stack Overflow, users faced suspensions for attempting to delete their posts.

The partnership between OpenAI and Reddit marks a pivotal moment in AI development. Observers will be keen to see how this deal influences the capabilities of AI models and the evolution of online communities. Both companies must navigate these privacy concerns carefully and ensure responsible data management to uphold user trust and maximize the positive outcomes of this collaboration.

The news of Reditt’s partnership with OpenAI and Google comes less than a year after the social aggregation site laid off 90 employees, or about 5% of its workforce. Reddit CEO Steve Huffman also stated at the time that the company would scale down its hiring plans to about 100 individuals, down from the initial target of 300.

Reddit has been growing exponentially since its inception 16 years ago. Today, Reddit has almost 2 billion in monthly traffic. The social news aggregation site has grown to become one of the 25 most-visited websites in the world and the 7th most-visited website in the U.S., according to Alexa Internet data as of February 2021.

