The web is incomplete

When we use Google everyday and mostly work with technology and related topics that are well indexed, it’s easy to forget the truth: the web is horribly incomplete. I’ve been doing some research for an upcoming podcast and it’s very frustrating to encounter huge gaping voids in the internet where history, audio recordings, and photographs should be. Somewhere out there is an audio cassette tape recording that I’d like to hear, but it will probably gather dust in an attic for the next decade instead. It needs to be even easier for anyone to put everything they have online so that it can be preserved and shared. Already I think the current generation raised on instant messaging and the web may not realize that there’s a whole world out there that is outside the reach of our keyboards. At least I know I sometimes forget.

The other part of the problem is linkrot. And not just 404s, but old links to obsolete file formats that can no longer be accessed. I can’t even count how many links to .ram files I’ve clicked that result in an error. When your content requires a special server (RealAudio streaming server software, in this case), it’s only a matter of time before that content itself will die.

Now, the good news is that a simple MP3 file and static HTML file with JPEG images will be around forever. It requires no special server software, no dynamic processing of any kind, and client software is so widespread and open that it’s a guarantee you can access it 10 years later. The only missing piece of the puzzle is reliable non-expiring domain registration and hosting.

The bad news is the rise of centralized web applications and data stores. What happens when YouTube shuts down? Remember they burn through huge amounts of cash for bandwidth each month and seem to have few options for becoming profitable. I feel better about Flickr, because they get it, but “Yahoo! has been known”: to not treat data longevity seriously.