TL;DR I am moving as much of my internet curating to pinboard.in as I can, because I like the dude who runs it, and it’s easier to backup. You can find me on pinboard.
Whenever I move arounds large bits of information, that I don’t ever look at, I ask the question why am I moving around information that I don’t ever look at. It can put you in a philosophical mood. When this happens I entertain the idea that data should be temporary like temporary sand paintings. You put a bunch effort into an elaborate work of art, and then you wipe it away. A lot of my work could fall into this category, and why not just wipe it away after you are done? You probably learned something. Hopefully you internalized something. Why not delete the work. The last time this mood struck, and I started philosophizing, I was surprised to find that some of my cloud services, without bothering to ask, were trying to erase my data for me. Now I am the point where I need to decide what data means to me. I need to do this before the Internet personal-data-apocalypse happens.
The sand painting route exposes fundamental difference’s between spiritual pursuits, and intellectual ones. While pursing one’s intellect It’s helpful to have a large library on which to build intellectual arguments. Standing on the shoulders of giants and all. Spirituality, especially the kind practiced by temporary sand painters, seems to be a pursuit of salvation that comes from within. Salvation here meaning that real truth is found by turning to your own mind. If salvation, or spirituality is all that maters to you. It would make sense that the ritual is what matters, and by not archiving data you are freeing your own mind to do some serious thinking. On the other hand, if you want to make a sound and reasoned argument, or you want to experiment in some academic fashion it can help to know what others have done, or even what you did years ago.
There is this article I read about preserving ones data. As far as I can remember it had salient thoughts about giving up all your data, and how this brings you some kind of powerful calm. Funny thing though, I can’t find it. I forgot to bookmark it with one of my million social bookmarking tools. I can’t point you to it, and I can’t go back to review it to inform my thinking about this article now. I’d like to review that article now because It probably made made some good points, at least I think it did.
The start of my plan to be the master of my data has to start with defining what my data is, and what I want to keep. While I can easily see that storing everything forever is an act of madness, you might not. Here are some of my guidelines to what I want to store, and why.
I don’t need to keep all my mp3s, and movies. Wait a few years until the next millie vainly, and you will see why most music has a short cultural timespan. Besides, streaming services have partially alleviated my need to keep everything. There are still holes in streaming services catalogs, but it won’t be that way forever.
After that comes my pictures, and videos. Right now I rely upon things like YouTube, and Flickr, to archive these things, in reality I shouldn’t and I know this is a weakness in my archival system. It’s okay for them to be one prong of your system, but they probably shouldn’t be the only prong.
After media, and pictures comes various bits that are mostly text. There are two vectors on to which I map text content: private and public. Basically it comes down to how secure do you want to keep this information. Text like passwords, bank info, personal id numbers all need to be stored in a very secure way, in a private way. Things that I find on the internet, or my annotations on those things could be stored in public for all I care.
It’s these text bits of information that tend to be the most important to me. For secure data I try and store that in something like 1password, or KeePass. I have one very large password that I have committed to memory, and then I use these programs to generate passwords for all my online services. Everything else that is textual I store in dropbox. So far I have been able to store all my code, everything I have ever written for my blog in a free 2gig dropbox account, someday I am sure this will grow. I also probably need to create some snapshot of my Dropbox folder in case of some cataclysmic Dropbox failure where they send the delete signal to all of my nodes, and they loose their backups.
After I built a mental model of what I wanted to store, and where I wanted to store I had to take a look at my toolset. I need to build a plan to archive my flow. After years of using a menagerie of tools, for various reasons, that keep my stuff spread out everywhere, I am trying to reign in my tools set, and hopefully use less then a handful to consume, process, and publish items that interest me. What I don’t have under control at the moment is my curation process. Keeping track of all the things that I collect from the internet is the reason I was prompted to have this philosophical debate with my self. It started when in short order I found a couple of my favorite internet services screwing up my data.
My beloved Google Reader did an abrupt change this year. Leaving its users understandably wary of the companies less then stellar commitment to anything that doesn’t make money, or isn’t google plus. The hard way is the wrong way to learn that sometimes your hot little tools don’t do the right thing.
It’s wrong of course to complain at all about Googles actions they were offering a free service, and under no agreement to carry on doing so. What’s worse is that I totally understand why they did it. They are ruthlessly aligning all products around their Facebook killer. A Facebook killer needs people, and lots of them. One tool they have at their disposal to bootstrap people is the change their other products, like google reader, to feed users into it. The changes to google reader in some crazy way make sense, but they still boned me. I started a survey of my data, and tools.
The details of the Reader change are what make it pertinent to the topic of preserving data. Many of us knowingly use free tools like Reader. We know that they often pretend to be fully open, and sell their service as if at any time you can just take all your data and leave. BTW, I have a good straw man going here, but to be clear when I say pretending i don’t mean willful pretending, just pretending that good intentions are all you need. Alas, they are all lying. I mean, come on, the road to hell is paved with good intentions and all that. I am being dogmatic here, by the way. I am essentially saying that if any user can’t get all there data out then they are lying about being open. They are, but like 99% of there users don’t ever run into the problems that I run into.
Google Reader, my favorite whipping boy, is one such example. When I went to save my history of sharing everything looked grand. They even had support for multiple export formats. I downloaded my 5 year long-heavy-duty-use history. Great. I can put it all in one place now. Pinboard allows you to import Readers particular format. This was going to be so easy. I could finish it before lunch. After importing, pinboard reported that I only had 1500 links. LOLWUT. That number was so small, and so perfect I knew someone wasn’t telling the truth.
It turns out Reader was only exporting up to a certain amount of items, and then silently finishing. If I hadn’t taken the time to check, and had waited years until I decided to do something with this file I would have been pissed. Really though, at no one but myself.
I cobbled together a script that got all my data, but it wasn’t easy, and what I did was out of reach for a normal consumers.
I suppose even this in isolation wouldn’t really be a cause for alarm, but it’s not the first time. Instapaper has the same problem. When you try and export your entire history, and it’s large, you can’t.
When you put these two things together you start to find a larger trend. Both services, Instapaper and Google Reader, are services used by people who care about data, and data preservation. Yet, they are both a little rough around the edges. They are both penalizing highly active users. It alarmed me, and this warn us all. So, its time to fight back against the bell curve. I am going all in on pinboard.in.
The old codger that runs pinboard – I don’t know if he’s old, or a codger I just have this image of him – has put out all the right signals about how he can keep the business sustainable. He is putting out the signals that he is going to run the site in a manner that can make it cost effective to do so over a long run. How long, I’m not sure, but my guess is that it’s at least 5 years, and up to 20 years.
By coalescing around one service I now have a known quantity. I can make sure that the tools I need to backup my data are simple. Which is what I am now doing. I am in the process of building a local/cloud hybrid. Because I think that storing interesting articles from the internet before the get removed from the internet for any number of reasons is the best way to build up a library for which to build intellectual arguments. And, I just like the idea if preserving knowledge.