Manton Reece - How does Micro.blog even work?

I’ve been upgrading servers and improving performance in Micro.blog lately, a theme which will continue throughout the year to make everything as stable as possible. Sometimes this introduces new bugs or weird behavior that makes people scratch their heads. What is Micro.blog even doing? So let’s look a little at the architecture.

When you write a new post, Micro.blog saves it into a MySQL database. We currently run 2 database servers, so that we can spread some queries between them and to make backups easier. But unlike many web apps, we do not serve blogs from this database. Blogs are published to a separate server as static HTML and images, served directly by Nginx with few or no dependencies on the rest of Micro.blog. This makes your blog very fast, and means that major parts of Micro.blog can go down without affecting your blog.

This has been a key design goal from the very beginning of Micro.blog. We host a blog for you, but it can have its own domain name and is only loosely tied to the rest of Micro.blog. This goal meant discarding some common architectures such as dynamically generating the blog when pages are requested.

Micro.blog is really 3 separate systems combined into a single platform: a blog admin interface, a blog hosting service, and a Twitter-like timeline.

To achieve this, Micro.blog has to translate the blog posts from Markdown to HTML. It runs all the text through Hugo. It also has to put photos and podcasts in the right place. When you upload a file, Micro.blog copies it to an object storage server at the same time that it syncs the file to your blog.

Timeline web requests and background tasks are run across a few servers, so that we can balance load and deal with outages. While Hugo wants all your Markdown and photos in a specific structure in the file system, Micro.blog maintains content in separate databases and then writes it out to the format that Hugo wants for processing.

Any given server could have part of your content or none of it yet, so Micro.blog will have to sync everything up. It does this in multiple phases to make publishing as fast as possible, and this is the area that I’ve been spending a lot of time tweaking.

First, Micro.blog attempts to quickly publish your latest post, so that it’s available at the permalink URL and included in the blog feed. If you have thousands of posts, it ignores most of them during this phase. It just wants to get your post up on the web as quickly as possible and added to the Micro.blog timeline.

Whenever Micro.blog is processing posts, it also applies any custom themes for your blog. It never skips the Hugo step, even if your blog post content is so simple that it could be previewed with a separate Markdown filter. Every post is run through Hugo, added to the RSS or JSON Feed, and only then processed into the Micro.blog timeline.

This round-trip journey your content takes is an important part of how Micro.blog works with external blog feeds like WordPress. We aren’t interested in building a proprietary social network that is not rooted in blogs. The timeline works with blogs no matter where they are hosted.

Next, Micro.blog will do a full publish of your blog, with the entire site, categories, photo pages, and archive. In some cases, it will combine the Markdown with any uploaded photos before processing them, but usually the uploads are already on your blog. It also keeps a copy of all the Markdown files, independent on any of the web servers, so that if possible it can update those versions without writing out potentially thousands of posts to the file system.

This phase of publishing is the longest, although it’s faster now than it has ever been. During this phase, your latest post should already be live and the timeline updated, so it’s not as annoying to wait around for the archive or category pages to update.

I’m exposing more of what Micro.blog is doing behind the scenes in the logs for your account. Here’s a snippet from my log recently, although I’ve flipped it so that it reads in chronology order instead of newest at top:

2021-04-11 16:54:25: Publish: Not queued, publishing manton
2021-04-11 16:54:25: Publish: Initial prepare for manton
2021-04-11 16:54:25: Publish: Preparing pages for manton
2021-04-11 16:54:25: Publish: Persistent folder exists, updating for manton
2021-04-11 16:54:25: Publish: Initial posts for manton
2021-04-11 16:54:26: Publish: Linking shared content files for manton
2021-04-11 16:54:26: Publish: Running Hugo for manton
2021-04-11 16:54:26: Publish: Initial Hugo run for manton
2021-04-11 16:54:26: Publish: Initial sync for manton
2021-04-11 16:54:26: Publish: Pinging manton, progress: 0.866 seconds

Here, there are actually 2 overlapping background tasks. The lines with “Initial” (italicized above) are part of this first phase of quickly publishing your post. In this case, the round-trip from saving the content, publishing the feed, and then updating the timeline was about 1 second. Under a few seconds is kind of the gold standard we’re aiming for.

Finally, Micro.blog assembles the timeline so that it can be served quickly no matter how many people you are following. We have a Redis server that keeps the timeline for each user in a sorted set, and use that from the Micro.blog API to page between posts. Micro.blog also processes posts for @-mentions, sending Webmentions, auto-linking URLs, and other details that are beyond what I wanted to write about here.

Could this be even better? Yes. But while I’m sometimes tempted to change the architecture to something closer to WordPress’s model, I know there’s always more performance we can squeeze out of our current setup.