Posts Tagged: archives


19
Apr 10

Privacy in Social Software

Last week, I wrote a number of essays critical of Twitter’s decision to provide a collection of public Tweets to the Library of Congress for permanent archiving.  I argued that by taking user data and putting it into a public archive, Twitter had meaningfully restricted the privacy rights of users.  Some of you agreed with my position, many didn’t; but all who commented or wrote to me helped shape my thinking.  In this post, I want to provide a little more context on the nature of privacy in systems like Twitter.

Last week, I gave a talk on the dynamics of privacy in Facebook.  In the research, we modeled a behavior that is increasingly pervasive in Facebook: having a friends-only profile.  I want to draw attention to one slide from the talk:

In this slide, the two slopes you see are the growth of Facebook, and the proportion of UNC undergraduates with friends-only profiles.  Now, the data are on different axes, and Excel is fitting the lines, but the trend is meaningful.  With growth in the service we see a correlated turn towards privacy.

While the pattern I observe is only general to Facebook at UNC, other researchers have observed similar patterns of privacy behavior in other social software.  For example, as Friendster scaled,

[S]o too did the diversity of the social networks represented. A growing portion of participants found themselves simultaneously negotiating multiple social groups—social and professional circles, side interests, and so on.  (boyd, 2007)

With the increasing complexity of diverse audiences, individuals turned to a range of strategies to manage their privacy: multiple accounts, limiting disclosure, or simply dropping out of the service.  Regarding Myspace, Caverlee and Webb (2008) reported (bold is mine):

Overall, the fraction of private profiles is increasing with time, indicating that new adopters of social networks may be more attuned to the inherent privacy risks of adopting a public Web presence. We find that women favor private profiles 2-to-1 over men, and that (perhaps, counter-intuitively) younger users are more likely to adopt a private profile than older users. We also find that the more connected a user is in the social network, the more likely she is to adopt a private profile.

And now in Facebook, our research finds a similar movement towards privacy as the service grows and networks diversify.  One can only suspect that Facebook’s recent “privacy upgrades” and changes to the terms of service prohibiting privacy of certain information has something to do with this normative shift.

Looking at the data across systems, I’d like to speculate that there’s a general property at work.  In a social software system, as the system grows and diversity of networks increases, so does utilization of privacy.  Here’s a graph I’ve constructed illustrating the trend (larger version):

The slope is purposefully convex. In the early stages of adoption, network use is sparse, so individuals are incentivized to lower privacy, to increase the odds of finding others. As time passes and the service grows, individuals form dense, small-world clusters. At this stage, individuals are mainly connected to one another within one context, and there are minimal bridges between contexts. Therefore, individuals can afford to keep privacy low, due to minimal risk of inadvertent sharing across context. As the system expands, however, we see a turn back towards privacy as an increasing number of bridges across context are created. In this moment of context collapse, individuals erect barriers of privacy to facilitate continued disclosure.  Here’s a closer look at the (simulated) networks:

By linking privacy to context collapse, I argue that mobilization towards privacy is largely a function of perceived audiences (and harms).  This distinction is important because it holds privacy attitudes constant.  Research, both mine and by others, has demonstrated that privacy attitudes do not necessarily predict privacy behaviors.  Awareness of privacy-in-context is actually the key variable causing the dynamic shift towards privacy in social software systems.

Let’s return our attention to Twitter.  What does your Twitter network look like?  If you’re an average user, your network probably contains a few offline friends (many, many fewer than Facebook or Myspace) and some celebrities (your definition may vary).  There may also be a few friends you’ve made on Twitter, who you don’t know offline.  Chances are, the average Twitter user’s network looks like the sparse “Early Adopter” or “Small World” network.

We see evidence in cultural practice that users have sparse networks in Twitter.  Going back to my notes on Alice Marwick’s AOIR ’09 talk, the culture of celebrity serves a very functional purpose for Twitterers with sparse networks, who wish to connect out of  limiting contexts.  “Talking” to celebrities (and finding others who talk to the celebs you talk to) is a way of escaping one’s sparse world, finding new people to follow in a known context.  Hashtag culture provides further evidence that individuals are trying to talk “across” or “out” of limited contexts.  If your network is sparse, turning to site-level anchors like hashtags and celebs provides a reliable stream of conversation in networks where conversation is lacking due to structural impediments.

I wonder how long these practices will need to continue.  Just the other day, Twitter announced that 100 million people had created accounts.  You can’t turn the news on without hearing about Twitter.  A large group of people, primed on social software by Facebook, are waiting to join Twitter.  And over the next year or two, they will, raising issues of context collapse, and prompting a turn toward increased privacy among early adopters.

My major problem with the Twitter/LoC agreement is that the people who will be confronted with context collapse and a growing need for privacy have lost meaningful recourse.  As I argued in my last post, it becomes impossible to take back what you’ve shared, a real and useful privacy strategy.  You’ll still be able to make your account private, but it seems there’s little you can do about the Tweets you sent that were archived permanently in the Library of Congress.

Why is this bad?  Let’s consider a hypothetical.  In 2007, Myspace had 100 million users.  Myspace was growing fast, with many users signing on for the first time.  Myspace users had two options for privacy: public or friends-only.  And a lot more people had public profiles in 2007 then they do today.  How would we feel, now, if Myspace had given all of its public profiles to the Library of Congress for permanent archive in 2007? I can only guess that a bunch of people who had public profiles in 2007 might feel a little uncomfortable about it (cue the “it’s their own damn fault” chorus).

I guess I should feel relief that if Twitter is going to do this to users, at least they are partnering with the LoC (an admirable entity).  But, in reading what LoC staff is saying about this effort, I’m not comforted.  Of the dataset, LoC Blogger Matt Raymond writes “I’m certain we’ll learn things that none of us now can even possibly conceive.” National Archivist David Ferriero writes “What will historians be able to glean from our tweets?  We can’t be sure, but it will probably be very interesting” (while also stating “Twitter is not for everyone. If you are anything like me, you don’t really care what someone had for breakfast.”)  It strikes me that the Twitter archive is being treated like a novelty, promising to be an amazing treasure trove when new research methods are developed.

Maybe it’s all these years of running t-tests (developed 1908), but I’m skeptical that these Tweets are going to tell us something that we can’t quite imagine.  Robust methods develop slowly, and are validated over time.  We’ll probably still be doing text mining, linguistic and sentiment analysis, and content analysis 50 years from now.  One area that is improving rapidly, however, is the identification of individuals in large data sets.  The Netflix dataset was identified by Narayanan and Shmatikov.  Acquisti and Gross demonstrated they were able to guess people’s social security numbers from public data.  And old-fashion detective work by Michael Zimmer identified the T3 Facebook dataset.  Of the future, we know this: It will be easier to connect you to your archived Twitter identity.

So here’s the thing.  Why won’t Twitter make the archiving a simple, opt-in process?  Or at least allow people to opt out?  Twitter obviously knows that giving user data to a permanent archive is different from sharing an API or allowing a Google spider – they wouldn’t have approached the LoC if this wasn’t the case.  I may be the only voice shouting about this, but this is a big, watershed moment regarding user privacy.  EFF, EPIC, Facebook watchdogs – where are you?  Let’s work with Twitter and make this right.


5
Sep 06

Facebook: A Generation’s Identity Archive

This morning, millions of college students are thinking differently about their online identity. The reason? Facebook, the industry-leading college social networking website, introduced “feeds” last night. Feeds are pretty simple – they’re a running list of what you’ve been doing in the Facebook. For example, if you add a friend, update your relationship status, upload photos – this all gets dumped into a feed, viewable by anyone that can view your account.

The logic that went into such a feature is easy to explicate. When you’ve got 200-400 friends in Facebook, it is impossible to keep track of them all. Remember when we had to keep track of 30 blogs manually? It sucked. And we solved that problem with RSS – let the updates come to us. Facebook has taken this notion and applied it to our lives. Facebook knows that its userbase uses the service to “keep up” with people – continuous social research, if you like – so this addition appeals to very base motives of Facebook users. Clearly, this is an idea that sounded great on paper.

In reality, however, this gets messy. Let’s get some background. First, I’m convinced that many young users of Facebook don’t look at the site as a social networking service per se. This generation has been socialized on Xanga, LJ and forums – they are comfortable and used to the idea of being on a social website. The Facebook simply represents another game-like social website that they are on – nothing more. Second, digital identity, like that presented in the Facebook, thrives because it is temporal. You can change your identity at the drop of a hat – you can become a liberal or conservative at the push of a button, change your interests an hobbies on a whim. The point is, you’re always presenting the identity you want to present – you never have to worry about the identity you used to present.

I believe that identity disclosure is so high in the Facebook for the first reason I cited – students see this as a game, something that is qualitatively less than real. Students disclose lots of real information, but they also disclose lots of false information. The key to winning in the Facebook is maintaining a good mixture of the real and false information. Implicit in this is the reality that you can always change the fake information, when you want – you can rewrite history at any time.

This morning, millions of students were shown that they can’t actually rewrite history. Everything they do, all of the groups they join and interests they state or friends they make – it is all being recorded. Not only is it being recorded, it is being presented as content to other users of the Facebook. The Facebook is no longer just a current method of identity presentation, it is an archive of our digital identity. This is a cold, hard reality for students, and you’re seeing a lot of public venting of discomfort as a result.

So lets prognosticate a little, and see what might happen to the Facebook, now that entire userbase is acutely aware of the fact that everything they do is being recorded and shared with the world.

  • First, I believe this move will cause a lot of mental discomfort to students who hadn’t really thought through online identity. They will be presented with all of the changes from their friends and realize that they, too, are having every minute change in their identity fed to hundreds of others.
  • Second, I believe students will be forced to rethink how they socialize in the Facebook. Facebook has reached a critical mass among college-age students, and my research has shown that many students on the Facebook now use the service heavily for out-of-network connections. Their cousins, old friends, brothers and sisters are on the Facebook. Knowing that everything they do will be presented to their entire network will have a chilling effect. Here’s an example: A student posts a change to their profile late at night, as a joke for a friend. That student knows that likely, only a few people will see his change, and he can revert it in the morning. With the new Facebook, that change is now broadcast to the entire network – and it is saved in an identity archive – the feed.
  • Finally, I believe this change will wake students up to the realities of sharing identity information online. Granted, it won’t wake them up much, but it may just convince them that these sites aren’t really games. It may also convince them to think of the future repercussions of sharing information anywhere – not only in the Facebook but in Bebo, Myspace, Hi5, Xuqa and the like.

Personally, I don’t believe this is a horrible move for Facebook. They took a pre-existing model (RSS) and applied it to identity. What they may not have done is thought deeply about how their users approach identity. People love exploring each other, but we don’t want to leave traces behind. We don’t want people to be able to see if we’ve viewed the profiles of others. We don’t want people to know if we decline their friend requests. Social networkingsSystems must enforce basic structural rules for trust to occur, I believe “not leaving traces behind” may turn out to be one of those rules.

Of course, Facebook has stated that feeds are subject to all privacy controls. You can opt out of the system totally, or on a case-by-case basis. However, opting out of sharing in these services, where sharing is incentivized, creates issues of inequality in the system. Students who opt-out aren’t playing the game fairly, more or less.

Reaction to the service has been mixed, with Techcrunch’s Arrington giving a neutral review (mostly a recounting of the features). The comment thread was less friendly. Over on the developer forum, a self-selected bunch of power users are engaging in threads with names such as “Why are people allowed to stalk every move I make now” and “Stop it – I almost cancelled my account today!”.

While an interesting move, I do believe that a gradual rollout or more in-depth consideration of user’s privacy concerns would have benefited Facebook. The Facebook seems to be run by a group of extremely determined Facebookers (many were early and full-immersion adopters), so it is possible that groupthink effects have caused the team to lose some focus of the average user.

The takeaway here is that Facebook, like it or not, has brought to bear a very real issue in online identity. Everything we do in public or semi-public spheres can be tracked and chronicled. We don’t see our digital footprints as much because systems haven’t cropped up to collect them, but collecting them is trivial. Facebook has simply put one of those systems in front of us – wrapped up nicely as a feature – but it isn’t hard to see the reality. As we grapple with this reality – that our privacy is only a construct of a system, and that our identity can be tracked and chronicled – how will students change their behavior? We’re really only at the tip of this iceberg, but with Facebook’s new features, we’ve accelerated this discussion substantially.

P.S. – I should also note that Facebook now has a official blog, which you may want to check out. Hopefully they’ll add an RSS feed soon.