Blog archive format

As I’ve been improving the import and export functionality in Micro.blog, I’ve done a lot of work with WordPress’s WXR format, which is based on RSS. While there’s nothing particularly wrong with WXR, it’s more complicated than it needs to be for non-WordPress sites, especially when you start to tackle image uploads that exist outside of the post text.

Micro.blog can also push an entire site’s Markdown, HTML, and images to GitHub, which is the most complete mirror and perfect for migrating to another Jekyll server. It introduces so many extra files, though, it’s not reasonable to expect that other blog platforms could support the same level of detail.

I’d be happy to ignore the WordPress-centric nature of WXR and use it as a common blog archive format if WXR provided a mechanism to store image uploads. Helping people migrate from WordPress to Micro.blog-hosted blogs has only emphasized to me that a better format is needed.

In chatting with the IndieWeb community, the idea was proposed that an HTML file using h-feed would provide portability and also an added bonus: it could be opened in any web browser to view your archived site. Images could be stored as files with relative references in the HTML file. (I’d throw in a JSON Feed file, too, so that importers could choose between using a Microformats parser or JSON parser.)

The files would look something like this:

  • index.html
  • feed.json
  • uploads
    • 2017
      • test.jpg

The basics from h-feed would follow this structure:

  • h-feed
    • h-entry
      • p-name
      • e-content
      • dt-published
      • u-url
    • h-entry

Only index.html and feed.json would be required. Any other paths in the archive would be determined by the contents of the HTML. (I’m using “uploads” in this example, but it could just as easily be “archive”, “audio”, or any other set of folders.)

For large sites, the HTML could be split into multiple files with appropriate <link> tags in the header to page through the additional files. While it could contain CSS and your full blog’s design, I’m imagining that the HTML would be extremely lightweight: just enough to capture the posts, not a way to transfer templates and themes between blogs.

The whole folder is zipped and renamed with a .bar extension. Easy to move around and upload all at once. I’ve created an example file here (rename it .zip to open it).

I’d love to hear what you think. I talked about this on a recent episode of Timetable as well. Might be a nice topic to follow up on at IndieWebCamp Austin in 2 weeks.

Tim Berners-Lee’s Solid

I’ve written about IPFS before, but Solid (from Tim Berners-Lee himself, among other MIT folks) is another new proposal for a more distributed web. I wasn’t familiar with it until reading this article at Digital Trends, which first makes the case for independent content vs. the big centralized platforms:

Now a handful of companies own vast swaths of web activity – Facebook for social networking, Google for searching, eBay for auctions – and quite literally own the data their users have provided and generated. This gives these companies unprecedented power over us, and gives them such a competitive advantage that it’s pretty silly to think you’re going to start up a business that’s going to beat them at their own game.

The article continues with the types of data you might share in a Solid application:

For example, you might keep your personal information in one or several pods: the sort of data about yourself that you put into your Facebook profile; a list of your friends, family, and colleagues; your banking information; maps of where you’ve traveled; some health information. That way if someone built a new social networking application—perhaps to compete head-on with Facebook, or, more likely, to offer specialized services to people with shared interests—you could join by giving it permission to access the appropriate information in your pod.

One of the showcase applications is called Client-Integrated Micro-Blogging Architecture, surely named mostly for its pronounceable acronym. From the CIMBA project site:

CIMBA is a privacy-friendly, decentralized microblogging application that runs in your browser. It is built using the latest HTML5 technologies and Web standards. With CIMBA, people get a microblogging app that behaves like Twitter, built entirely out of parts they can control.

Solid and CIMBA are built on the Linked Data Platform, which in turn is based off of RDF. I’m admittedly biased against RDF, because it often brings with it an immediate sense of over-engineering — too abstracted, solving too many problems at once. I’m glad to see this activity around a distributed web, and I’ll be following Solid, but I also continue to believe that the simple microformats and APIs from the IndieWebCamp are the best place to start.