OpenAI And MidJourney In Talks To Purchase WordPress And Tumblr Data

Updated: Friday, March 1, 2024, 08:58 [GST]

The tech industry is buzzing with news that Automattic, the parent company of WordPress and Tumblr, is in discussions for a potential data and content deal with AI giants OpenAI and MidJourney. This development, first brought to light by 404 Media through an undisclosed source at Automattic, suggests an agreement could be on the horizon, marking a significant moment in the digital content and artificial intelligence sectors.

The conversations come amidst rumours of a new revenue opportunity for Tumblr through a deal with MidJourney, as speculation grows on the platform. However, the journey to this point has not been without its challenges. A partly unsuccessful data transfer to OpenAI and MidJourney was reported, which included sensitive and private user content according to a Tumblr product manager. The implications of this mishap remain unclear, with further details awaited.

The scramble for AI training data is intensifying. Companies in the generative AI space, which previously relied on freely scraping data, are now actively purchasing it. A recent report highlighted Reddit's discussion to license its extensive user-generated content to an undisclosed AI company, a deal potentially worth $60 million annually. This move comes as Reddit aims for a public offering valued near $5 billion in March.

Such licensing agreements are gaining traction as tech giants seek to navigate the legal complexities surrounding copyright and data use. The urgency for legitimate content deals has been amplified by ongoing legal battles, including a significant lawsuit by the New York Times.

Automattic's Stance on Data Use and Privacy

Automattic's engagement with AI firms raises pertinent questions about the use of user-generated content for AI training. The company has reportedly planned a feature allowing users to opt out of sharing their data with third parties, including AI entities. Following the report by 404, Automattic clarified its position, stating its practice of blocking major AI platform crawlers by default and sharing only public content from those who haven't opted out on WordPress.com and Tumblr.

Automattic's forthcoming approach aims to respect community values, including attribution, opt-outs, and control. Yet, opting out might have repercussions, as outlined in an upcoming FAQ. Users who choose to opt out will have their sites blocked from crawlers, with Automattic committing to inform partners about any new opt-outs to ensure the removal of content from past and future AI training data.

The unfolding scenario underscores a broader issue facing the digital world: the use of online content for AI training. As AI technology advances, the discussion around data privacy and user rights is set to deepen. While companies with vast data repositories may benefit significantly, the implications for average internet users remain a concern.

More technology News

UiPath Unveils Coding Agents Integration For Enterprise Orchestration And Transformation

ZainTECH And Oman Data Park Join Forces To Boost Cybersecurity And Regulatory Compliance Across The Region

The RE:HUMAN Report: Six AI Trends Reshaping Creativity, Wellness, and Culture

As the landscape evolves, the balance between leveraging AI for innovation and safeguarding individual privacy continues to be a critical conversation in the tech community and beyond.

Artificial Intelligence OpenAI