I was pleased to see that my last post on Twitter and the LoC generated excellent discussion both here in the comments and over in Twitter. I’ve seen some great defenses of the deal, but unfortunately I’m not buyin’ quite yet. I thought I’d use this post to quickly raise a few more questions and concerns.
First, a quick review of some of the conversation about the deal. Zimmer is all over it, raising a number of great open questions, and exloring how private tweets just might end up in the LoC’s archive. The Atlantic has rounded up opinions, particularly an interesting conversation going on at The Big Money. Also notable is a BBC interview with Twitter’s general counsel, though it skips over privacy issues. Now that I think of it, skipping over privacy issues might be the theme of this essay.
One of the central problems with this deal are the set of assumptions around public Tweets. Particularly, because the Tweets are “already public“, individuals lose all rights to the content. In my last post, I drew explored some ways in which content shared in public actually wasn’t public content. For example, practically obscure public content that is meant for a select audience. In this post, I want to challenge another assumption that people make about public content: that it lives forever.
If there’s one thing that social media has taught us, it is that if you post anything to the web, it stays there forever. Of course, this is empirically false. Companies go out of business, databases corrupt, servers crash, indexes get expunged, identifiers get mixed up, and even with the best intentions and good backups, data are lost. Think about the Google search results for your name. Are they the same they were 1, 3, or 5 years ago? While it is likely that you could tell me tons about new results that have come online over that time period, could you tell me about the ones that have gone offline?
So let’s just take a second and put the assumption that the internet is a giant cache to bed. The internet is dynamic, fragile, and designed to lose things. The internet has probably forgotten more about you than it remembers. The next question generally brought up is “What about Google!” If you want an answer to that question, send out a Tweet and then delete it. Wait a few days and search for it. The Tweet is gone, because Google isn’t in the business of sending you to 404′s. Thank the market for that one. After we knock down the Google straw man, the next assumption generally covers the suspicious “other” person who is stalking you and creating a giant portfolio of everything you do. I hate to pop everyone’s bubble, but unless you’re a really, really significant public figure, this person doesn’t exist for you.
So why is it that we all assume that the content we share publicly will be around forever? I think this is a classic case of selection on the dependent variable. When we Google ourselves, we are confronted with what’s there as opposed to what’s not there. The stuff that goes away gets forgotten, and we concentrate on things that we see or remember (like a persistent page about us that we don’t like). In reality, our online identities decay, decay being a stochastic process. The internet is actually quite bad at remembering.
The Library of Congress, on the other hand, is quite good at remembering. Magnificently good at it, most likely the best in the world. And that is what’s troubling. Up until Twitter sent its archives over to the Library of Congress, Twitter users could realistically expect they could make things go away. They could delete Tweets. They could change their account name. They could remove their account. Without consulting their users, privacy advocates, rights organizations, or any other voices of reason, Twitter has summarily taken these very real privacy remedies away from their users.
This gets me to what is so frustrating about Twitter’s move: a frighteningly cavalier attitude towards shipping around the data of tens of millions of consumers. Twitter has literally passed the personal information of millions of users to a permanent, public archive without so much as pre-notification, consultation, or the opportunity for debate. And while even though it appears legal for the LoC to have the data, big questions remain regarding whether Twitter has actually violated its own contract with users. How can I meaningfully own my content after it has been shipped to a government archive?
In all my years of using Twitter, the idea of canceling my account has never even vaguely crossed my mind. Until last Wednesday, that is.
Update: American Prospect has a great interview with Martha Anderson of the Library of Congress. Regarding the deal:
The agreement has been signed, but we still have a lot of technical details to work out — how we’ll technically transfer it, and when.
Regarding opt-out:
You know, I don’t know. I think that’s a question for Twitter. There’s several questions about that which they are still working out. We asked them to deal with the users; the library doesn’t want to mediate that.
Regarding user information:
I think that’s one of the big issues for us to understand in terms of privacy. And there’s a lot of work going on, especially over at [the National Institutes of Health] about how to anonymize data and still make it useful. We’re really big on partnering with people to learn what they’re learning, so I think that’s an area we’ll look into. In serving it, what can we do to make it useful to research but not identify personal information?








I’m not canceling my Twitter account just yet, but I think it’s important we question whether this move is a good thing, so thanks for this post. It reminded me of two things. First, of Merlin Mann’s response upon finding out someone was selling a compilation of people’s tweets, including his, for a profit and without anyone’s permission. Second, your comment “The internet is actually quite bad at remembering” reminded me of the book Delete: The Virtue of Forgetting in the Digital Age, which makes and extends a similar argument. You probably already know about it, but I figured I’d mention it just in case you (or your readers) have not.
First, it seems to me you’re just replacing one straw man with another here: sure, the assumption that “the content we share publicly will be around forever” is false, but some subset of it will undoubtedly persist longer than we want it to. And who wants to trust a stochastic process to remove unsavory content about or by us from a globally-accessible system? Though I agree with your quoted point, I think you’re raising the bar far too high; the fact is we don’t know what will persist and what won’t, and for that reason it is useful (though false) to assume that everything you post online will live forever.
Second, I don’t know why it took this recent LoC business to get people so up in arms about retention of UGC. From Twitter’s TOS:
When you signed up for Twitter, you granted them the right to do what they’ve done with the LoC and a whole lot more. Consider it the price of reaching a global audience. And I think you’re right that the only real solution here is to disengage (as Ellul would no doubt observe were he still with us), but that concern is no more serious now than it was before Wed.—it’s just more salient.
[...] Is it time to cancel your Twitter account? [...]
[...] archiving. I argued that by taking user data and putting it into a public archive, Twitter had meaningfully restricted the privacy rights of users. Some of you agreed with my position, many didn’t; but all who [...]
[...] themselves. I came upon them further downstream. In Twitter and the Library of Congress and Is it time to cancel your Twitter account? Stutzman seemed unimpressed with librarians stewardship and foresight. He writes in the first post: [...]
Hypothetical for you, Fred. What happens when the storage shifts from public (e.g., LOC) to private (or at least privately-directed)? So, for example, lets say I wrote an email client plugin that grabbed and archived all my twitter conversations and DMs (both to me and from me) and integrated those into my searchable email archives, much like gmail currently does with IMs? It could also go further and integrate the ‘about’ information from twitter into my email addressbook. Do you think either of these cross a line? Or by staying essentially ‘private’ to other people you have a relationship with do they stay OK? Would your sense there change if that information were automatically gathered for everyone you’d ever corresponded with, even if you hadn’t had a DM conversation with them?
I’m asking primarily because it seems like twitter->LOC is an easy-ish case for believers in privacy-as-contextual-integrity, but the cases are only going to get harder from here on out, especially as the indexers of information get better and (perhaps) as some of them move to the edges and away from centralizers like Google/Facebook/LOC/etc.
[...] donner ainsi ses archives à la Bibliothèque, mais cette fois au nom du respect de la vie privée (ici, là ou là). L'archive de Twitter est véritable mosaîque de données à caractère [...]
[...] como refere Fred Stutzman (http://fstutzman.com/2010/04/16/is-it-time-to-cancel-your-twitter-account/), o Twitter passou literalmente a informação de milhões de utilizadores para um arquivo publico [...]