Posts Tagged: privacy


19
Apr 10

Privacy in Social Software

Last week, I wrote a number of essays critical of Twitter’s decision to provide a collection of public Tweets to the Library of Congress for permanent archiving.  I argued that by taking user data and putting it into a public archive, Twitter had meaningfully restricted the privacy rights of users.  Some of you agreed with my position, many didn’t; but all who commented or wrote to me helped shape my thinking.  In this post, I want to provide a little more context on the nature of privacy in systems like Twitter.

Last week, I gave a talk on the dynamics of privacy in Facebook.  In the research, we modeled a behavior that is increasingly pervasive in Facebook: having a friends-only profile.  I want to draw attention to one slide from the talk:

In this slide, the two slopes you see are the growth of Facebook, and the proportion of UNC undergraduates with friends-only profiles.  Now, the data are on different axes, and Excel is fitting the lines, but the trend is meaningful.  With growth in the service we see a correlated turn towards privacy.

While the pattern I observe is only general to Facebook at UNC, other researchers have observed similar patterns of privacy behavior in other social software.  For example, as Friendster scaled,

[S]o too did the diversity of the social networks represented. A growing portion of participants found themselves simultaneously negotiating multiple social groups—social and professional circles, side interests, and so on.  (boyd, 2007)

With the increasing complexity of diverse audiences, individuals turned to a range of strategies to manage their privacy: multiple accounts, limiting disclosure, or simply dropping out of the service.  Regarding Myspace, Caverlee and Webb (2008) reported (bold is mine):

Overall, the fraction of private profiles is increasing with time, indicating that new adopters of social networks may be more attuned to the inherent privacy risks of adopting a public Web presence. We find that women favor private profiles 2-to-1 over men, and that (perhaps, counter-intuitively) younger users are more likely to adopt a private profile than older users. We also find that the more connected a user is in the social network, the more likely she is to adopt a private profile.

And now in Facebook, our research finds a similar movement towards privacy as the service grows and networks diversify.  One can only suspect that Facebook’s recent “privacy upgrades” and changes to the terms of service prohibiting privacy of certain information has something to do with this normative shift.

Looking at the data across systems, I’d like to speculate that there’s a general property at work.  In a social software system, as the system grows and diversity of networks increases, so does utilization of privacy.  Here’s a graph I’ve constructed illustrating the trend (larger version):

The slope is purposefully convex. In the early stages of adoption, network use is sparse, so individuals are incentivized to lower privacy, to increase the odds of finding others. As time passes and the service grows, individuals form dense, small-world clusters. At this stage, individuals are mainly connected to one another within one context, and there are minimal bridges between contexts. Therefore, individuals can afford to keep privacy low, due to minimal risk of inadvertent sharing across context. As the system expands, however, we see a turn back towards privacy as an increasing number of bridges across context are created. In this moment of context collapse, individuals erect barriers of privacy to facilitate continued disclosure.  Here’s a closer look at the (simulated) networks:

By linking privacy to context collapse, I argue that mobilization towards privacy is largely a function of perceived audiences (and harms).  This distinction is important because it holds privacy attitudes constant.  Research, both mine and by others, has demonstrated that privacy attitudes do not necessarily predict privacy behaviors.  Awareness of privacy-in-context is actually the key variable causing the dynamic shift towards privacy in social software systems.

Let’s return our attention to Twitter.  What does your Twitter network look like?  If you’re an average user, your network probably contains a few offline friends (many, many fewer than Facebook or Myspace) and some celebrities (your definition may vary).  There may also be a few friends you’ve made on Twitter, who you don’t know offline.  Chances are, the average Twitter user’s network looks like the sparse “Early Adopter” or “Small World” network.

We see evidence in cultural practice that users have sparse networks in Twitter.  Going back to my notes on Alice Marwick’s AOIR ’09 talk, the culture of celebrity serves a very functional purpose for Twitterers with sparse networks, who wish to connect out of  limiting contexts.  “Talking” to celebrities (and finding others who talk to the celebs you talk to) is a way of escaping one’s sparse world, finding new people to follow in a known context.  Hashtag culture provides further evidence that individuals are trying to talk “across” or “out” of limited contexts.  If your network is sparse, turning to site-level anchors like hashtags and celebs provides a reliable stream of conversation in networks where conversation is lacking due to structural impediments.

I wonder how long these practices will need to continue.  Just the other day, Twitter announced that 100 million people had created accounts.  You can’t turn the news on without hearing about Twitter.  A large group of people, primed on social software by Facebook, are waiting to join Twitter.  And over the next year or two, they will, raising issues of context collapse, and prompting a turn toward increased privacy among early adopters.

My major problem with the Twitter/LoC agreement is that the people who will be confronted with context collapse and a growing need for privacy have lost meaningful recourse.  As I argued in my last post, it becomes impossible to take back what you’ve shared, a real and useful privacy strategy.  You’ll still be able to make your account private, but it seems there’s little you can do about the Tweets you sent that were archived permanently in the Library of Congress.

Why is this bad?  Let’s consider a hypothetical.  In 2007, Myspace had 100 million users.  Myspace was growing fast, with many users signing on for the first time.  Myspace users had two options for privacy: public or friends-only.  And a lot more people had public profiles in 2007 then they do today.  How would we feel, now, if Myspace had given all of its public profiles to the Library of Congress for permanent archive in 2007? I can only guess that a bunch of people who had public profiles in 2007 might feel a little uncomfortable about it (cue the “it’s their own damn fault” chorus).

I guess I should feel relief that if Twitter is going to do this to users, at least they are partnering with the LoC (an admirable entity).  But, in reading what LoC staff is saying about this effort, I’m not comforted.  Of the dataset, LoC Blogger Matt Raymond writes “I’m certain we’ll learn things that none of us now can even possibly conceive.” National Archivist David Ferriero writes “What will historians be able to glean from our tweets?  We can’t be sure, but it will probably be very interesting” (while also stating “Twitter is not for everyone. If you are anything like me, you don’t really care what someone had for breakfast.”)  It strikes me that the Twitter archive is being treated like a novelty, promising to be an amazing treasure trove when new research methods are developed.

Maybe it’s all these years of running t-tests (developed 1908), but I’m skeptical that these Tweets are going to tell us something that we can’t quite imagine.  Robust methods develop slowly, and are validated over time.  We’ll probably still be doing text mining, linguistic and sentiment analysis, and content analysis 50 years from now.  One area that is improving rapidly, however, is the identification of individuals in large data sets.  The Netflix dataset was identified by Narayanan and Shmatikov.  Acquisti and Gross demonstrated they were able to guess people’s social security numbers from public data.  And old-fashion detective work by Michael Zimmer identified the T3 Facebook dataset.  Of the future, we know this: It will be easier to connect you to your archived Twitter identity.

So here’s the thing.  Why won’t Twitter make the archiving a simple, opt-in process?  Or at least allow people to opt out?  Twitter obviously knows that giving user data to a permanent archive is different from sharing an API or allowing a Google spider – they wouldn’t have approached the LoC if this wasn’t the case.  I may be the only voice shouting about this, but this is a big, watershed moment regarding user privacy.  EFF, EPIC, Facebook watchdogs – where are you?  Let’s work with Twitter and make this right.


16
Apr 10

Is it time to cancel your Twitter account?

I was pleased to see that my last post on Twitter and the LoC generated excellent discussion both here in the comments and over in Twitter.   I’ve seen some great defenses of the deal, but unfortunately I’m not buyin’ quite yet.  I thought I’d use this post to quickly raise a few more questions and concerns.

First, a quick review of some of the conversation about the dealZimmer is all over it, raising a number of great open questions, and exloring how private tweets just might end up in the LoC’s archive.  The Atlantic has rounded up opinions, particularly an interesting conversation going on at The Big Money.  Also notable is a BBC interview with Twitter’s general counsel, though it skips over privacy issues.  Now that I think of it, skipping over privacy issues might be the theme of this essay.

One of the central problems with this deal are the set of assumptions around public Tweets.  Particularly, because the Tweets are “already public“, individuals lose all rights to the content.  In my last post, I drew explored some ways in which content shared in public actually wasn’t public content.  For example, practically obscure public content that is meant for a select audience.  In this post, I want to challenge another assumption that people make about public content: that it lives forever.

If there’s one thing that social media has taught us, it is that if you post anything to the web, it stays there forever.  Of course, this is empirically false.  Companies go out of business, databases corrupt, servers crash, indexes get expunged, identifiers get mixed up, and even with the best intentions and good backups, data are lost.  Think about the Google search results for your name.  Are they the same they were 1, 3, or 5 years ago?  While it is likely that you could tell me tons about new results that have come online over that time period, could you tell me about the ones that have gone offline?

So let’s just take a second and put the assumption that the internet is a giant cache to bed.  The internet is dynamic, fragile, and designed to lose things.  The internet has probably forgotten more about you than it remembers.  The next question generally brought up is “What about Google!”  If you want an answer to that question, send out a Tweet and then delete it.  Wait a few days and search for it.  The Tweet is gone, because Google isn’t in the business of sending you to 404′s.  Thank the market for that one.  After we knock down the Google straw man, the next assumption generally covers the suspicious “other” person who is stalking you and creating a giant portfolio of everything you do.  I hate to pop everyone’s bubble, but unless you’re a really, really significant public figure, this person doesn’t exist for you.

So why is it that we all assume that the content we share publicly will be around forever?  I think this is a classic case of selection on the dependent variable.  When we Google ourselves, we are confronted with what’s there as opposed to what’s not there.  The stuff that goes away gets forgotten, and we concentrate on things that we see or remember (like a persistent page about us that we don’t like).  In reality, our online identities decay, decay being a stochastic process.  The internet is actually quite bad at remembering.

The Library of Congress, on the other hand, is quite good at remembering.  Magnificently good at it, most likely the best in the world.  And that is what’s troubling.  Up until Twitter sent its archives over to the Library of Congress, Twitter users could realistically expect they could make things go away.  They could delete Tweets.  They could change their account name.  They could remove their account.  Without consulting their users, privacy advocates, rights organizations, or any other voices of reason, Twitter has summarily taken these very real privacy remedies away from their users.

This gets me to what is so frustrating about Twitter’s move: a frighteningly cavalier attitude towards shipping around the data of tens of millions of consumers.  Twitter has literally passed the personal information of millions of users to a permanent, public archive without so much as pre-notification, consultation, or the opportunity for debate.  And while even though it appears legal for the LoC to have the data, big questions remain regarding whether Twitter has actually violated its own contract with users.  How can I meaningfully own my content after it has been shipped to a government archive?

In all my years of using Twitter, the idea of canceling my account has never even vaguely crossed my mind.  Until last Wednesday, that is.

Update: American Prospect has a great interview with Martha Anderson of the Library of Congress.  Regarding the deal:

The agreement has been signed, but we still have a lot of technical details to work out — how we’ll technically transfer it, and when.

Regarding opt-out:

You know, I don’t know. I think that’s a question for Twitter. There’s several questions about that which they are still working out. We asked them to deal with the users; the library doesn’t want to mediate that.

Regarding user information:

I think that’s one of the big issues for us to understand in terms of privacy. And there’s a lot of work going on, especially over at [the National Institutes of Health] about how to anonymize data and still make it useful. We’re really big on partnering with people to learn what they’re learning, so I think that’s an area we’ll look into. In serving it, what can we do to make it useful to research but not identify personal information?


14
Apr 10

Twitter and the Library of Congress

I’m currently at the CHI conference, which is commanding all of my attention, but the news about Twitter and the Library of Congress is too big to ignore (see also Zimmer, RWW).  Quoting the LoC:

Have you ever sent out a “tweet” on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.

According to Biz Stone, Twitter will begin transferring all of their public tweets, after a six-month embargo, to a permanent, public archive at the Library of Congress.  Let me say something (probably) unpopular: I’m a little horrified.

If you talk to people about things shared online, you generally run into two assumptions.  The first is that things shared publicly are meant for the general public.  The second is that things shared publicly are meant for posterity.  Both of these assumptions are dangerous.  Some of my recent work has identified that people do share privately in public, and that individuals do engage in the grooming (i.e. removal) of content shared publicly.  danah’s found this.  So have lots of others.  If there’s anything we should know by now about social media, is that a deterministic, one-size-fits-all approach to privacy is a bad approach to privacy.

This is what makes Twitter’s “gift” troubling.  It assumes that all content shared publicly is truly public and for posterity.  Let’s consider some edge cases.  Bob has two Twitter accounts, one for work and a personal account.  Both are public, but the only way people find out about his personal account is that he tells people the obscure handle.  Bob wants to be practically obscure – private in public – without going to all the trouble of setting up complicated privacy controls.  So what happens, two years from now, when Bob accidentally discloses his handle in the wrong context, and he needs to remove some Tweets?

There’s probably a certain class of reader that looks at Bob and says, well, Bob’s out of luck.  There’s Google cache and third party tools and a whole host of other ways tweets are preserved.  The difference I’d argue is that these tools have certain properties – they react to API calls, they decay, etc. – that make them qualitatively different from a professionally managed archive.  Through the creation of a permanent, public, third-party archive, Twitter changes the privacy-management strategies that are going to be available to users in the future.  This is critical, because if Bob can’t trust his down-the-road privacy management strategy, Bob might share less today.

This is a great opportunity to plug the work of Helen Nissenbaum, whose most recent book Privacy in Context extends the argument for privacy as contextual integrity.  Nissenbaum argues that disclosures have contextual expectations, and that shifting these expectations constitutes a meaningful violation of privacy and freedom.  Even though the tweets are public, it is a fallacy to assume that digital content shared in public was created with an understanding that the content would end up in a third-party, government-managed archive.  Facebook’s helped us demonstrate again and again that privacy is both qualitative and quantitative.

Practically, there are some questions that Twitter needs to address about this move.  First, Twitter’s terms of service specifies that:

You retain your rights to any Content you submit, post or display on or through the Services. By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).

The way I read this is that as long as your content is on Twitter, Twitter can do what they want with it.  Fine.  But what if you remove your content from Twitter?  Wouldn’t Twitter’s licensing of your content to the LoC also expire?  Twitter needs to address exactly how we can pull our content out of the archive when we want.  Michael Zimmer thinks that Twitter users won’t have the ability to remove tweets from LoC, so how will Twitter rectify this in the terms of service?

A broader question is why Twitter didn’t just build this as an opt-in service.  Or even, less preferably, an opt-out service.  Is the collection so important that it is worth compromising user privacy?  I’ve got a feeling that there are certain assumptions around “public” content and the feel-good vibe of the Library of Congress that led to a lack of critical thinking about the implications of this move.  It’s time for Twitter to start sharing more information, opening up an earnest conversation about this move.


29
Mar 10

Facebook Again to Test Privacy Boundaries

I’ve been paying attention to the discussion regarding Facebook’s proposed changes to the privacy policy (so has Michael Zimmer, TechCrunch, RWW and VentureBeat).   The most controversial is a proposal for Facebook to automatically share personal information with third party websites.  The mechanics go something like this: If you’re logged in to Facebook, and you visit a third-party site that has an established relationship with Facebook, Facebook will provide the website with your General Information, which is:

“your and your friends’ names, profile pictures, gender, user IDs, connections, and any content shared using the Everyone privacy setting.”

How would this work in practice?  Let’s imagine that CNN and Facebook team up.  If you’re logged into Facebook and visit CNN, the website would be able to welcome you by your full name, display gender-relevant content, show you recommendations from the people in your network who also visit CNN, and so on.  Going a little further, if you share your interest information, CNN might be able to dynamically display stories that match your interests.

The level of disclosure proposed in this new policy is similar (or even identical) to the information disclosure required for use of a Facebook app.  The critical difference in the new policy is that while applications require an opt-in, it appears that this new process will require an opt-out.  Facebook spokesperson Barry Schnitt:

“The opt-out hasn’t been built yet. We just want people to know they’ll be able to opt out. We’ve made that commitment. There will be an opt-out right when the user gets to the site, and there will be some opt-out functionality on Facebook. But as to where the button will be or how it will look, I don’t know, because they don’t exist right now.”

In theory, there will be two opt-outs.  The first will be the hypothetical button that Schnitt talks about.  The second will be to log out of Facebook and remove the Facebook cookie.  In reality, though, if you’re a Facebook user, you can never really opt-out, because any time a Facebook friend visits a third party site Facebook will share some of your information with that site.

Although it is a good sign that Facebook has gone on record regarding privacy control, the previous comment reveals Facebook’s cavalier attitude towards privacy.  Quite literally, they’re talking about pushing identity information of 400 million people around, yet privacy is treated as an afterthought – something they’ll figure out later.  When will companies like Facebook and Google start bringing privacy teams in at the beginning of the design process, rather than at the end?

Shifting topics a little bit, I see this move as notable because it marks Facebook’s first foray into large-scale warehoused behavioral targeting.  Targeting companies like Doubleclick (owned by Google) routinely mine our travels around the web, allowing third-party consumers to generate targeted recommendations based on our habits.  Because this happens behind the scenes, we’re less likely to notice it (which doesn’t make it any less troubling).  Facebook’s move stands to confront us with behavioral targeting, and they should consider the boundary they’re confronting.  It may not seem to be a big thing to have a third party website welcome you by your first and last name, but it is a paradigm shift on the web.

TechCrunch argues that it is time to sharpen the pitchforks, in preparation for the major backlash against the service.  Let me explain why this is frustrating.  In my opinion, the role of the privacy team is to navigate the necessary tension between our freedoms to disclose and how companies can ethically and morally profit from our data.  Facebook’s failures with Beacon or Google’s failure with Buzz are not “wins” for privacy; rather, they are losses for companies, consumers, and the market.

This brings me back to what is troubling about the “sharpening pitchforks” mentality.  It doesn’t and shouldn’t have to be this way.  Compared to Doubliclick, Facebook’s move really isn’t any more troubling – if the system is implemented properly.  And if the system is implemented properly, it could be a win – for consumers, for Facebook, and for third parties.  So how can Facebook navigate this challenge?  Let’s start with research, sensible design, and a different style of rollout than the traditional ask-for-forgiveness-later approach Facebook seems to believe in.

At Facebook’s current size and scale, they can’t afford to get this wrong.  Through research, testing, and a willingness to put the customer first, Facebook could navigate the challenges of this new feature.  But make no mistake, more than anyone, they are in the bulls eye right now.  And if Facebook does decide to play cavalier with privacy, the mobs TechCrunch describe will be waiting.


16
Feb 10

What Google Could Learn From Goffman

In the week since Google introduced Buzz, the most interesting thing about the fiasco has been watching the company.  For an organization as risk-averse and PR-aware as Google, a public failure offers insight that can’t be gleaned from watching daily operations.  As Google attempts to fix the problems and move the conversation onward, I thought I might reflect on some of the teachable elements of this event.

First, a little bit of back story.  As part of my fellowship at the School of Information and Library Science, I teach a course about social network sites.  Each week, I sit down with my students to discuss the social, legal, ethical and privacy implications of social network sites, among other things.  Potentially noteworthy is that my course doesn’t spend a lot of time on social network science – graph theory, quantitative analysis of networks, etc.  Rather, we concern ourselves with the interaction of people with social technology at large scale.

In our readings and discussions, we’re often challenged to think about how people present themselves in technology.  When you create a profile in a social network site, or share a stream of Tweets, you’re essentially creating a representation of an identity.  As we’ve seen time and time again in Facebook, we run into problems when identities collide during “context collapse” – when people from a different segment of your life view an identity you’ve constructed for your friends.

Taken one way, it could be argued that this problem of separate identities reveals some sort of fundamental character flaw: “Why aren’t you the same person to everyone?”  As Google CEO Eric Schmidt pointed out, “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”  It is the intersection of technology and philosophies like Schmidt’s that are causing companies like Google and Facebook to stumble again and again, creating “privacy nightmares.

Many of the readings in my class are influenced by Erving Goffman’s theories of identity and interaction.  Goffman, the legendary Chicago-school sociologist and former ASA president, elaborates in rich detail the process of social interaction in his books The Presentation of Self in Everyday Life, Behavior in Public Places, and Interaction Ritual.  In essence, Goffman argues that identity and interaction are performative, a concept that maps very well onto social network sites.  By “creating” identities, we’re not living dual lives, but rather engaging in a well-established performance of identity that lets us share the proper “front” in context.  We act differently on LinkedIn and Facebook because these sites have contextual norms, not because we’re duplicitous.

At the beginning of each semester of my class, I tell my students that they’re going to leave with a skillset that helps them negotiate human interaction with social technology.  I’ve sat up at night, pondering the value of such a skillset.  More than anything, the Buzz fiasco has driven home the point that we need interdisciplinary information professionals that can work with teams in negotiating the social implications of their tools.  These are the students I’m working with, and I wonder how Buzz would have rolled differently if their voices were brought to the table.

The builders of social technologies are challenged to manage the relationship between technical affordance and what is, for lack of a better term, human inertia.  That is, the tendency for people to act like people.  As Google Buzz engineers attempted to reconfigure our notions of a social group (work/friends/romantic/etc. was collapsed to “most frequently contacted”), they ran smack into human inertia.  Even though Google’s algorithms have likely figured out a more efficient way for us to group the people we know, it was simply too much to ask us to configure ourselves to the technology.

By fabricating new social groupings, Google ran head-on into Facebook’s biggest problem – that of context collapse.  When we merge social groups together, we are challenged to manage our disclosures across these groups, which have different norms of propriety.  How is it possible that Google didn’t see the potential problems of such context collapse at scale?  I’d like to offer a potential answer.

If you read a history of Silicon Valley (such as Katie Hafner’s or Michael Hilzitk’s), you’ll notice a theme of interconnection.  Silicon Valley’s tech economy is a dense series of highly entrepreneurial networks, where employment is characterized by acceptance of failure and short tenures.  The work of AnnaLee Saxenian reveals this trait as being fundamental in the Valley’s success; ideas are gestated frequently, teams assemble rapidly through the uncharacteristically large networks of oft-moving tech employees.  As good as this is for innovation, it is bad for the development of a social networking site.

Working in Silicon Valley is a classical embeddedness problem.   If you work in the Valley, it is likely that many of the people you know share similar traits.  They work at the same company as you, think about similar problems, went to similar schools.  Such homophily is beneficial for allowing entrepreneurial teams to assemble quickly, but it is bad for finding heterogenous opinions.  Consider the case-in-point of the Google Buzz test – it was rolled out initially to Google’s 20,000 employees.  These employees – similar on many traits, richly compensated, cognizant of privacy – are different in key ways from the rest of the Buzz ecosystem.  Perhaps the homophily of the test base accounts for how devastating edge-cases weren’t designed for, or perhaps groupthink shouted such possibilities down.  Either way, this is an important lesson about the pervasive problems of homophily when designing privacy systems.

While involving interdisciplinary information professionals like the ones I train in the design process would be a good step forward, it is easier said than done.  Just as Silicon Valley engineers collide with human inertia, the Valley has its own inertia of bigger, better, and faster.  Introducing the human perspective into such a culture is an ongoing, and challenging problem (see the work on Values in Design).  Right now, the market (and the opinion-sphere, to a lesser extent) regulates and acts as the proxy for human problems with systems.  I’d like to think that by introducing informed, professional voices to the discussion, we can move beyond this reactionary approach to privacy.  Perhaps Buzz is the case that moves this discussion forward.

Image used under CC-BY-ND, original source.


18
Jun 09

Zimmer on the Facebook Dataset

Michael Zimmer has released a new critique of the “Facebook Dataset” – and it is well worth reading.

Recall that last fall, a group of researchers affiliated with the Berkman Center for Internet & Society at Harvard University released a dataset of Facebook profile information from an entire cohort (the class of 2009) of college students from “an anonymous, northeastern American university.” While the researchers took good faith steps to preserve the anonymity of the source of the data (and, presumably, the privacy of the subjects), I quickly narrowed it down to 7 possible universities, and then with only a little more effort, identified the source (with some confidence) as Harvard College. All this without ever even downloading or looking at the actual data.

Download the draft of Michael’s paper.


6
Apr 09

NY Mag asks “Does Facebook Own You?”

New York Magazine leads with an interesting piece on data ownership and online social networks by Vanessa Grigoriadis.  I’ve got a quote in there, which builds on some writing I did last month.

This is part of who I am now—somebody who knows that her nursery-school tormentor wasn’t a bully without a heart. It will get logged into my profile, and that profile will become part of the “social graph,” which is a map of every known human relationship in the universe. Filling it in is Facebook’s big vision, a typically modest one for Silicon Valley. It’s too complex for a computer scientist to build. Just as our free calls to GOOG-411 helped Google build its voice-recognition technology, we are creating the graph for Facebook, and I’m not sure that we can take ourselves out once we’ve put ourselves on there. We have changed the nature of the graph by our very presence, which facilitates connections between our disparate groups of friends, who now know each other. “If you leave Facebook, you can remove data objects, like photographs, but it’s a complete impossibility that you can control all of your data,” says Fred Stutzman, a teaching fellow studying social networks at the University of North Carolina at Chapel Hill. “Facebook can’t promise it, and no one can promise it. You can’t remove yourself from the site because the site has, essentially, been shaped by you.”

Check the full article.