Creative Commons has proposed a new set of declarations called CC signals to help AI crawlers understand how a creator wants their content to be used in AI training. It has taken me a little while to wrap my head around this, in part because there is a lot of writing to introduce the idea: multiple web pages and a 34-page PDF.
It’s best to skip right to the technical overview on GitHub. There are currently four building blocks, or “elements”, that combined together are an addition to the usual Creative Commons licenses.
At the risk of over-simplifying it, here is my summary of these new elements:
- credit: You want credit back when your content is cited by AI.
- direct contribution: You want some kind of payment when your content is used.
- ecosystem: You want the broader ecosystem around your content to be supported by the AI company.
- open: You want your content to be used only for open weight AI models.
These are layered on as exceptions to a separate proposal at the IETF for completely opting out of certain kinds of AI training. For example, we could declare that we don’t want generative AI training except if “direct contribution” is made back to us.
If we don’t care how our content is used, presumably we simply do not declare any CC signals. My blog is licensed as CC BY, which means I only want attribution when my content is republished in another form. Especially in light of this week’s ruling in the Anthropic case, my blog posts can be used in AI training.
I’ll be watching how all of this shakes out. In Micro.blog, we have a couple plug-ins to block AI crawlers. I also created a plug-in for declaring Creative Commons licenses for a blog. When it appears that CC signals are stable, I’ll add support for signals to the Creative Commons plug-in, for any bloggers who want to be more explicit about how their posts are used for AI training.