Cloudflare is on the offensive against AI bots

Matthew Prince announcing a major new effort at Cloudflare to block AI crawlers:

Cloudflare, along with a majority of the world’s leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content.

I’m concerned that this default goes too far. Cloudflare has enormous power to intercept web traffic, because they’ve effectively re-centralized DNS for so many websites. While Matthew’s reasons for doing this are good, it should still be an opt-in feature. The open web should by default be open.

When you think about where Cloudflare’s business originally came from — protecting websites that were under a denial-of-service attack — it’s understandable that Cloudflare would see any non-human request as fitting in the same category. Bots from hackers are bad, stealing hosting computing resources. Bots from AI companies are also bad, stealing content.

That’s an oversimplification, though. Some bots from AI companies are training new models. Some bots are acting on the user’s behalf, a little more like a web browser, such as reasoning models that make new requests to the web to answer questions and cite their sources.

Cloudflare has a series of blog posts today with more details. In one post, they outline how AI crawlers can use HTTP Signatures (similar to what ActivityPub uses) to identify themselves if they have a relationship with Cloudflare for making payments to web publishers. When enabled, Cloudflare will return an HTTP 402 “payment required” response. There’s a mechanism for crawlers to say how much they will pay or to accept the listed price.

Cloudflare continues:

At its core, pay per crawl begins a technical shift in how content is controlled online. By providing creators with a robust, programmatic mechanism for valuing and controlling their digital assets, we empower them to continue creating the rich, diverse content that makes the Internet invaluable.

This sounds noble. However, this is a potential new source of revenue for Cloudflare, because they handle the payments from AI companies, and so they could choose to shave off a percentage for themselves. I’ve found no documentation yet for what this business arrangement might look like. I’m not suggesting that Cloudflare is doing this only for profit, but their business model could shift a little. They could be incentivized to block more requests, in the same way that Meta is incentivized to show more ads.

I can also imagine a harmless bot accidentally getting mislabelled as an AI crawler. Cloudflare has significant control even though they aren’t even the ones hosting your web site. According to a companion press release today, Cloudflare proxies traffic for 20% of the web.

In running Micro.blog, we sometimes see problems like this already. Micro.blog is always polling RSS feeds in the background, so that you can host your website on WordPress (or anywhere) and those posts will show up in your Micro.blog account. There is nothing nefarious about this. It’s how the open web and RSS feeds are supposed to work.

There have been a lot of good discussions lately — including in another one of Cloudflare’s blog posts today — about how the shift from Google to AI chatbots has affected web publishers:

Content publishers welcomed crawlers and bots from search engines because they helped drive traffic to their sites. The crawlers would see what was published on the site and surface that material to users searching for it. Site owners could monetize their material because those users still needed to click through to the page to access anything beyond a short title.

This is a narrow view of the web, though. What about all the blogs that don’t need to be monetized at all? We all publish to the web for a variety of reasons: to share what we’ve learned; to be part of a community; to have a place online for our photos; to help us think through a topic while writing a blog post like the one you’re reading; and just because it’s fun to add a little something to the larger web, building on human writing and culture. Not everything needs to be a financial transaction.

Cloudflare’s move today is bold. It is architected heavily around the needs of ad-based web publishers, but there will likely be costs in complexity for everyone else. For those who distrust AI companies, it will be worth it. I don’t know yet whether it’s actually a good thing for the whole web.

At Cloudflare’s scale, defaults matter. Such a big change should default to opt-in until we know more about how it will affect the web.

Manton Reece @manton