On a recent cross-country flight, I read a number of papers about My Life Bits, Lifestreams and Haystack – three projects informed by Vannevar Bush’s Memex. I also read Bush’s article As We May Think, in which the concept of the Memex is presented. Put simply, a Memex is a digital library of our life – a digitization of the documents we encounter, a recording of the conversations we have, a record of all things we do; the Memex is searchable, making it an invaluable digital integration into our lives.
Bush’s article is a fun read, full of postwar optimism and a sense that all can be accomplished. As it turns out, Bush was right – the Memex could be accomplished. Microsoft’s MyLifeBits (see the 2006 CACM writeup for the most recent literature) is a Memex of sorts, a relational database of multimedia life. And if form factor wasn’t an issue, the Memex would be more than a laboratory reality; for the most part all facets of Bush’s system exist and work relatively well.
Well, except for one – that’s the notion of “trails”. Introducing trails, Bush describes a highly adaptive system of pathfinders back to stuff you care about. If you’re going to record an entire life, you need very effective ways to re-find your information. This is not a trivial problem, however – text can be searched effectively, but what about sound and video, pictures and places? We’re still figuring that out – and there’s a lot of work to do. What’s more, media alone isn’t good enough; we need context to be able to understand why a kept picture or sound recording is interesting. Microsoft addresses this with textual annotation in the MyLifeBits/Stuff I’ve Seen work.
I think Microsoft is on to something – annotation is important. In the papers, the authors describe systems to collect story annotation about events. While there is a bias against loose hierarchy, I couldn’t help but think how much this approach wants to reminds me of tagging. The hard problems of the Memex are in dealing with material that can’t be “indexed”, so instead we annotate the material with our stories. The only problem is that stories move completely away from any notion of controlled vocabulary; association becomes a free-text search problem instead of a categorization problem. Sure, we’ve got the tools to deal with that, but it just doesn’t feel efficient.
Enter tagging. For the most part I believe tagging systems work best for personal re-finding. At sufficient scale, our collective personal re-finding becomes a collective knowledge, but this is an ancillary effect, rather than an intended purpose. Tagging systems don’t work because a ton of people use them; they work because tags are valuable to us. In a recent, much discussed article on social tagging in the enterprise, Raytheon employees described a system where librarians would gate-keep classification tags; that is, people would tag items, and the librarians would allow specific tags to be integrated into the document’s classification, if they felt the tag was correct. Stowe Boyd raised the right issue, this is tagging mis-implemented – tagging is not a top-down expert system, rather, it is a system for personal use.
Now I diverge my path for just a second, to provide an example. After putting 600-odd items into del.icio.us, I finally “got” del.icio.us. That is to say, I finally figured out how to use del.icio.us in the way that’s right for me. Here’s an example. When I’m planning travel, I’ll tag all of the things related to my travel with the name of the place I’m visiting. So that means I’ll tag the conference website, the hotel, the ground transport and places around town I want to check out with the city name. I’ll also tag my airline with the city. As you can see, one of these things is not like the other. Tagging a hotel in Vancouver with “vancouver” makes sense – but tagging Orbitz.com with “vancouver”? Would an expert ever allow this tag? No, they’d tell me to tag it with something more general, like airline or travel.
However, in tagging Orbitz.com with vancouver, I’ve made it personally relevant and re-findable. I can go to del.icio.us/fstutzman/vancouver and find everything I need related to my Vancouver trip. Its a fantastic re-finding system, and to that extent, it is facilitated by tags. But didn’t I just break the system by giving Orbitz a tag of vancouver? No, because tags alone are really only valuable to the tagger. It is only when the number of taggers grow to a certain size do you get the notable ancillary effect of collective meaning.
Tags are characterized in systems by a “cloud” – the cloud represents the tags most commonly applied to an item. Consider 100 people tagging Orbitz.com; certainly, some will tag Orbitz like I do – but many will “properly” tag orbitz with “travel” or “airfare”. Tags like “vancouver” and “ohio” will languish at the obscure end of the tag cloud, whereas the common, general tags, like travel and airfare, will get pushed to the top by the group. Its a nice side-effect, but its important to remember that’s only what it is – a side-effect of system’s scale. Raytheon breaks the model doubly by 1) implementing tags as a “public” finding system, rather than a personal re-finding system and 2) preventing the tag cloud from naturally working itself out. Imagine if I wanted to tag Orbitz with vancouver, and the system rejected my tag as “out of scope.” This would completely break the system for me. In fact, if I wanted to tag Orbitz with gibberish, as long as that gibberish was meaningful to me, it would be completely fine. The opportunistic tag cloud is only a side-effect of scale, and by concentrating on that, rather than personal re-findability, we break the model.
So this gets me back to the Memex. In a tag-aware Memex, I can easily tag things – whether they be pdf’s, mp3′s or video files, in a way that I’ll be able to recall them later. I can tag a video “2006 birthday” and be reasonably sure that 50 years later I’ll know that video clip was from my 2006 birthday. Here’s the thing, though – I’m not naturally good at tagging. Earlier, I said something to the extent that it took me 600 tagged items to figure out how to use del.icio.us in a way that’s right for me. We can do better. It makes me think there’s best practice, a best practice for tagging for personal re-findability. This best practice is obviously a mix of self-awareness (how do I think about things so I can find them at a later date) and good vocabulary/classification skills; put simply, we can find ways to teach tagging so that more users can embrace it. To this extent, librarians could relinquish their gatekeeper positions and come to understand and teach this folksonomic best practice; by throwing down the gates and “embracing the messiness”, a populace of taggers could put together fantastic indices.
The most important thing, though, is that we continue to understand the collective knowledge in these systems are effects of scale; personal re-findability is what the systems do best. Through better understanding of the practice of tagging, we can teach people to be better taggers (better being a very quantitative measure in which a person is more successful a re-finding items using different tagging strategies); with better taggers, we can have better collective knowledge. This better collective knowledge won’t come from the gatekeepers – instead it will come from the bottom up – from self-aware, passionate taggers.
Postscript: I believe tags are Bush’s trails. The Memex only makes sense when people can easily and successfully use it, and a system of folksonomy is the best to-date innovation in this area. We often approach tags as if they should exist in a world without tags and control; to a certain extent, they should – but simply because tags are lawless, this doesn’t mean we can’t do tags better. We don’t need to impose laws, but we can provide education that helps people tag better. Where are the studies of tagging and re-finding? Where is the tag-oriented update of the classic Barreau and Nardi paper “Finding and Re-finding?” There’s a world of opportunity for forward thinking library and information scientists. Tags clearly work – so let’s help people tag better; it will only benefit us all.