Rambling about Twitter archives

As 2012 was winding down, I was fascinated with LongPosts.com (built on top of App.net) and so used it to post some thoughts about Twitter archives. The site is gone now, so I’ve moved the text back to my weblog here, where it belongs anyway. The link to the ADN discussion is here. — future me, January 2016

One of the main goals of my web app Watermark is to archive and search tweets and ADN posts, so it was natural for me to implement support for Twitter’s new archive export format. I finished it last night and linked it from the Watermark account page this evening for all customers.

I had heard that Twitter’s export included a CSV version before I saw the actual files, so I started work coding an importer based on that, with the assumption that I could tweak it later. Once I saw a real tweets.zip, I had to throw out most of my initial work. The CSV files have two problems:

  • They don't properly escape values using quotes, so a comma inside a tweet makes the files more difficult to parse.
  • They don't include some essential Twitter metadata like the reply-to ID. I switched to using the JSON files and it's working well. They're JavaScript but not strictly JSON, so you just skip the first line.

Since the ZIP archive can be fairly big, instead of uploading in a web browser I let the user choose the file via Dropbox. This was a nice opportunity to try out the Dropbox Chooser. Then on the server I extract the files and load the data.

Dave Winer is doing something interesting with archives too. He’s started linking up other people’s archives on S3 — both the HTML view and the .zip file. I have a test Watermark account that I’ve loaded one of these into. It’s interesting to import multiple archives and have them all merged together and searchable.

For so long we’ve waited for access to our old tweets. In the meantime I’ve shipped two products around fixing this limitation, so it’s especially funny that Twitter finally rolls out archives after I’ve stopped posting there. (And of course I love that ADN has allowed access to your full post history from the very beginning.) Not entirely sure where all this is going to lead, but I agree with Dave Winer that new apps should be possible now.

Manton Reece @manton