Google rolls out tool for publishers to opt out of AI data training, but not search

Google has unveiled a new feature, Google-Extended, offering website publishers the ability to exclude their data from contributing to the development of Google’s AI models. While websites will still remain accessible through Google Search, this tool provides publishers with greater control over the use of their content for AI training purposes. In effect, Google will stop using the data of those publishers who opt out.

Managing AI Contribution

This move by Google addresses concerns among web publishers who wish to protect their data from being utilised in AI model training. Google-Extended enables publishers to manage the involvement of their websites in enhancing AI generative APIs like Bard and Vertex AI. Publishers can now exercise precise control over content access on their sites, preserving their data privacy rights, the Verge reported.

Balancing Visibility and Data Protection

Earlier this year, Google confirmed that it was training its AI chatbot, Bard, using publicly available data scraped from the web. This announcement sparked concerns and prompted publishers to seek ways to shield their content from being used for AI training purposes, much like the approach taken by major news outlets such as the New York Times, CNN, Reuters, and Medium.

Unlike other web crawlers, Google’s indexing is integral to a website’s discoverability in search results. Therefore, completely blocking Google’s crawlers could have adverse effects on a website’s online presence. To address this challenge, some publishers have resorted to legal measures, such as updating their terms of service to prohibit companies from leveraging their content for AI training.

Read More:   Panera Bread’s Charged Lemonade blamed for a second death, lawsuit alleges

Google-Extended is made accessible through robots.txt, a file that instructs web crawlers on which parts of a site they can access. As AI applications continue to expand, Google is committed to exploring additional machine-readable options that offer more choice and control to web publishers. Further developments in this regard are expected to be shared in the near future.

In short, Google’s introduction of Google-Extended provides publishers with a valuable tool to safeguard their data from contributing to AI model training while still benefiting from Google Search’s indexing capabilities. This development marks a significant step toward addressing concerns regarding the use of web content for AI training and ensuring greater transparency and control for publishers.

One more thing! We are now on WhatsApp Channels! Follow us there so you never miss any update from the world of technology. ‎To follow the HT Tech channel on WhatsApp, click here to join now!