Anti-Scraping Feature

Cloudflare Aims to Stop AI Data Collectors

URL

A conflict is brewing in artificial intelligence (AI): website operators versus AI companies. At the center is the question of who controls online content. The network services provider Cloudflare is now intervening with a new anti-scraping feature.

In a blog post, the US company introduced a new function that allows website operators to defend themselves against unwanted data collection by AI companies. This innovation is part of Cloudflare’s Content Delivery Network (CDN) and is available to both free and paying users.

Ad

The background to this development is the common practice of many AI companies to use publicly accessible web content to train their language models. While some industry giants like OpenAI and Google give website operators the option to defend against this “scraping,” not all AI developers offer this option.

Cloudflare vs. Perplexity AI

Cloudflare’s new feature itself uses artificial intelligence to detect automated attempts at data extraction. According to the company, the software can even identify bots trying to conceal their true identity. “We have observed bot operators attempting to appear as a real browser by using a fake user agent,” Cloudflare engineers explain. “Our global machine learning model has consistently identified this activity as a bot.”

A notable example of the system’s capabilities is the detection of a bot collecting data for the AI startup Perplexity AI. This bot was previously difficult to block as it disguised itself as normal user traffic.

Ad

Cloudflare rates each website visit with a score from 1 to 99, with low values indicating bot activity. Requests from the Perplexity AI bot consistently receive values below 30.

To keep pace with constantly evolving bot technology, Cloudflare plans continuous updates to its protection feature. Additionally, a tool is being introduced that allows website operators to report new bots.

Lars

Becker

Redakteur

IT Verlag GmbH

Ad

Weitere Artikel