Defining consent for AI

As AI is used in more tools, I’m thinking about consent from bloggers who don’t want AI to process anything in their writing. I couldn’t find a convention for this outside of model training. Once text is out on the web, visitors are going to use web browsers and other tools to act on that text — summarizing, translating, annotating.

That’s a good thing. Imagine if blogs were so locked down that you couldn’t copy a passage to quote in your own post. The open web is at its best when we can share other writing we’ve discovered, or build tools to help people manage blog subscriptions, bookmarks, and notes.

Those personal uses of AI tools have a very different scope than large-scale training and data collection. As a small example, Cory Doctorow blogged recently about using lightweight models to check his writing, even though he has concerns about the AI industry. I think that’s a reasonable balance that avoids the extremes.

So excluding the kind of overly broad “don’t let AI touch anything”, we’re left with a few practical capabilities that bloggers should have:

  • Blocking specific crawlers. The robots.txt file is still good for this. Micro.blog checks robots.txt when it’s running tasks that are crawl-like, such as archiving web pages.
  • Declaring what content can be used for AI training. There are proposals to solve this. Micro.blog never trains on user data, so mostly not an issue for us.
  • Turning off AI in applications. Micro.blog has a global checkbox to disable any feature that uses an LLM.

Taken together, this feels like a comprehensive checklist for consent. I blogged last month about our strategy for AI in Micro.blog. This is the complement to that post, making sure that we have a defined list to compare new features against.

Manton Reece @manton