Tag Archives: archiving

15 years of blogging

Fifteen years ago today I started this blog during SXSW. Although I didn’t think much of it at the time, because Twitter hadn’t been invented yet, my first post was essentially a microblog post. 145 characters and no title. (Titles on the old posts were added later during the migration to Movable Type.)

I’ve written about 1100 posts since then, and another 600 microblog posts. Some of my favorites last year:

And the year before:

And earlier:

Whether you started visiting this blog years ago or just today, thanks for reading. I hope to still be writing in another 15 years. (I’ll be 56 years old. My kids will be grown up. Nearly everything will be different.)

Stretching time out has a way of highlighting what matters. And if it matters, it’s worth writing down. I hope you’ll join me for the next chapter as I try to move indie microblogging forward with Micro.blog.

Talkshow.im archives

Shutting down a web site correctly isn’t easy. When Talkshow announced they were closing, I was surprised. Six months is a limited time to launch, get traction, and then wind down. But I was glad that they’d let any show be exported as an archive.

The archives aren’t available for very long. If you hosted a show on Talkshow, you have until December 1st to download it.

I downloaded a couple to see how Talkshow handled it. Just in case no one else grabs them, I’m copying them here: Pop Life episode 5 with Anil Dash and guest John Gruber, and the Six Colors live coverage for Apple’s September 7th event. I had Instapaper-ed both of these to read later anyway.

The archive itself is a simple .zip file with HTML, CSS, and user profile images. In the Finder it looks like this:

Talkshow.im Finder screenshot

This self-contained structure makes it very easy to re-share somewhere else. Credit to Talkshow for keeping this simple. But it also strikes me as so easy to keep hosting as static files, I wonder why Talkshow doesn’t keep the archives available indefinitely, which would preserve any existing links to these shows from the web.

App Store cleanup

I’m in favor of Apple’s upcoming app store cleanup, as long as they err on the side of keeping an app in the store if it isn’t clearly broken or abandoned. They should start slow with the obvious cases: crashing on launch, not updated for retina or even 4-inch screens. There’s a lot of low-hanging fruit that could be programmatically swept through.

David Smith wrote about this kind of App Store cleanup over 3 years ago, arguing that Apple could do a lot without getting into the subjective quality of an app:

Instead, I think Apple would be well served to adopt objective measures for quality or at least freshness to improve the overall quality of the Store. Adopting such a policy wouldn’t fundamentally change the situation for developers; every app they submit already has to be approved. All that this would do is apply some of those same required criteria to the app on an ongoing basis.

John Voorhees picked up on the urgency of Apple’s new policy for an article at MacStories:

We are well past the time when the number of apps served as meaningful bragging rights for Apple keynotes. The directness in tone and relatively short time frame given to developers to make changes to apps sends a clear message – Apple is serious about cleaning up the App Store.

It remains a challenge to preserve the part of our culture that is captured in old apps. I wish Apple could aggressively curate the App Store and allow old apps to be archived and available. But that’s far from an Apple priority. For now, it’s right to present the best possible user experience for App Store customers.

Email archiving with Evernote

For a long time, I’ve struggled with having important email archived in one place. I’ve switched between several clients over the years, from Eudora and Mailsmith and even Cyberdog, in the very early Mac days, to more recently the fairly reliable Apple Mail. Yet I still occasionally lose old email when switching between machines and not handling the migration properly.

Last year I set out to fix this. While I didn’t do an exhaustive search of archiving options, the main solutions I considered were:

  • Switch to Gmail. There are plenty of native clients for Gmail, but I fundamentally don’t like the idea of an ad-supported email service. I’m very happy with Fastmail and want to continue using it.
  • Local archiving with EagleFiler. This gets the email archived in a central place outside whatever mail client I’m using, which is great. However, I’d like something that is focused on cloud search first.
  • Save to files on Dropbox. All of my notes are stored on Dropbox, so why not put an email archive there too? But Dropbox doesn’t seem well-suited to accessing and searching easily.
  • Save to Evernote. I’ve never actively used Evernote for notes. Using Evernote for email would keep the email separate from normal notes on Dropbox, and Evernote already has excellent support for forwarding email into their system. I’d be able to search the archive from my Mac, iPhone, or the web.

I’ve settled into a pretty basic workflow of using Evernote to save any email that looks moderately valuable. This is usually a handful of messages each day, not every email I receive or send. By picking and choosing what gets archived, I can ignore everything else, letting it sit in Mail’s archive indefinitely or deleting it.

Here’s an AppleScript I currently trigger in Mail for any selected message I want to archive. It’s set to command-shift-S via FastScripts. If I’m away from my Mac, or I want to preserve HTML and inline attachments, I can save an email by forwarding it to a special Evernote email address. (I also pay for Evernote Premium.)

Now that I’m about a year and thousands of archived messages into this setup, I’m declaring it a success. I plan to continue using Evernote in this way for years to come. Let’s just hope they’re on the right track with their own business.

Web history and IPFS

Dave Winer on the continued disappearance of old web sites:

“I’ve tried to sound the alarms. Every day we lose more of the history of the web. Every day is an opportunity to act to make sure we don’t lose more of it. And we should be putting systems into place to be more sure we don’t lose future history.”

Earlier this week, Steven Frank pointed to a new format and protocol called IPFS, which Neocities is embracing. Copies of your content would live in multiple nodes across the web instead of in a single, centralized location. From their blog post:

“Distributing the web would make it less malleable by a small handful of powerful organizations, and that improves both our freedom and our independence. It also reduces the risk of the ‘one giant shutdown’ that takes a massive amount of data with it.”

I took some time to read through what it can do, and I’d like to support it for the publishing platform that’s in my new microblogging project. I don’t know if it’s technically feasible yet, but I love that someone is trying to solve this. We just have to start somewhere.

Write locally, mirror globally

The Atlantic has an interesting essay on whether Twitter is on a slow decline, less useful and meaningful than it once was:

“Twitter is the platform that led us into the mobile Internet age. It broke our habit of visiting individual news homepages first thing in the morning, and established behaviors built around real-time news consumption and production. It normalized mobile publishing power. It changed our expectations about how we congregate around shared events. Twitter has done for social publishing what AOL did for email. But nobody has AOL accounts anymore.”

It reminds me of something I brought up on Core Intuition a few months back, wondering if Twitter is a core part of the web, something that would be with us forever, or if it is “just another web site”. When we get into the groove of using a new service for a few years, it’s easy to forget that web sites don’t have a very good track record. Giant sites like Facebook and Tumblr seem to have been with us forever, but my personal blog is older than both.

Think about this: if it’s even possible for Twitter to fail — not likely, just possible — then why are we putting so much of our content there first, where there are rules for how tweet text can be used? Storage for all tweets is so massive that there’s no guarantee that other companies will be able to take over the archive if the service has to fold. It’s why I built Tweet Library and Watermark to archive and publish tweets.

Decentralization is the internet’s greatest strength and weakness. There shouldn’t be one service to hold all of blogging; each writer should have his or her own domain and web site. But web sites also die all the time from neglect. We need centralized services to index and syndicate content so that it’s preserved and accessible to more people.

Longevity is the next great challenge for the web. All of my work on Riverfold apps is leading this way, from archiving tweets, to curating and publishing your best photos, to indexing a copy of the text and HTML from your blog. But I’m just one guy with a limited server budget.

It’s time for a new web standard — a metadata format and API that describes how to mirror published content. Maybe it’s part of IndieWebCamp? When I write on my blog, I want the content to flow to GitHub Pages, to the Internet Archive, to Medium. When I post photos, I want the content to flow to Dropbox, to S3, to Flickr. It’s not enough to backup or copy data blindly; the source must point to each mirror, and each mirror service must understand who the creator is and how to find the original data if it still exists.

Unlike a distributed platform that works at the level of raw data, like BitTorrent, this new system should work natively with well-understood common files: text, photos, video, and the glue (usually HTML, Markdown, or JSON) that makes a collection meaningful. Instead of yet another generic sync system, it’s a platform that understands publishing, with adapters to flow content into each mirror’s native storage.

If you accept that this is something worth doing, then every place we put our content must be classified as either an original source or a mirror. And this brings us back to Twitter. Because while I think the next 5 years for Twitter will be strong, I’m not convinced that it will last 50 years. Therefore, Twitter cannot be an original source of data; it must be just one of several mirrors for micro-blogging.

Rambling about Twitter archives

As 2012 was winding down, I was fascinated with LongPosts.com (built on top of App.net) and so used it to post some thoughts about Twitter archives. The site is gone now, so I’ve moved the text back to my weblog here, where it belongs anyway. The link to the ADN discussion is here. — future me, January 2016

One of the main goals of my web app Watermark is to archive and search tweets and ADN posts, so it was natural for me to implement support for Twitter’s new archive export format. I finished it last night and linked it from the Watermark account page this evening for all customers.

I had heard that Twitter’s export included a CSV version before I saw the actual files, so I started work coding an importer based on that, with the assumption that I could tweak it later. Once I saw a real tweets.zip, I had to throw out most of my initial work. The CSV files have two problems:

  • They don’t properly escape values using quotes, so a comma inside a tweet makes the files more difficult to parse.
  • They don’t include some essential Twitter metadata like the reply-to ID.
    I switched to using the JSON files and it’s working well. They’re JavaScript but not strictly JSON, so you just skip the first line.

Since the ZIP archive can be fairly big, instead of uploading in a web browser I let the user choose the file via Dropbox. This was a nice opportunity to try out the Dropbox Chooser. Then on the server I extract the files and load the data.

Dave Winer is doing something interesting with archives too. He’s started linking up other people’s archives on S3 — both the HTML view and the .zip file. I have a test Watermark account that I’ve loaded one of these into. It’s interesting to import multiple archives and have them all merged together and searchable.

For so long we’ve waited for access to our old tweets. In the meantime I’ve shipped two products around fixing this limitation, so it’s especially funny that Twitter finally rolls out archives after I’ve stopped posting there. (And of course I love that ADN has allowed access to your full post history from the very beginning.) Not entirely sure where all this is going to lead, but I agree with Dave Winer that new apps should be possible now.