Unit Structures Fred Stutzman’s thoughts about information, social networks and technology.

Posted
Sep 29 2008, 1:58 pm

Categories
Thoughts

Bookmark
Post to

Facebook Datasets and Private Chrome

Via the Berkman Center, news of a Facebook dataset now available to the general public.  I haven’t written up the necessary research statement to access the data, but the publicly-available codebook provides insight into the set.  According to the codebook, the data is scrubbed, with personally-identifying data removed.

The “non-identifiability” of such a dataset is up for debate.  A friend network can be thought of as a fingerprint; it is likely that no two networks will be exactly similar, meaning individuals may be able to be identified in the dataset post-hoc (for friend-network verification, see Zinman & Donath, 2007).  Further, the authors of the dataset plan to release student “Favorite” data in 2011, which will provide further information that may lead to identification.  According to the authors, the collection of the dataset was approved by the IRB, Facebook and the individual college.  The dissemination of the dataset appears to be approved by the IRB.

In other news:

danah boyd recently gave a talk, “Understanding Socio-Technical Phenomena in a Web2.0 Era” at the opening symposium for MSR New England.  The video is available in a WMV stream.

Via W.H., Iron - a version of Google Chrome with all Google reporting stripped out.  In theory, this will also prevent the auto-update functionality, one I was never comfortable with.

Citation:
Zinman, A. and Donath, J.  (2007).  Is Britney Spears spam?.  In Fourth Conference on Email and Anti-Spam, Mountain View, CA, 2007.


6 Comments

Posted by
Andrew
30 September 2008 @ 6pm

Hi Fred -

Do you have a view on the merits of the Dataverse Network Project?

Regards
Andrew


[...] research purposes, which I know a lot of people who will find this quite valuable (and thanks to Fred Stutzman for bringing it to my [...]


Posted by
Michael Zimmer
30 September 2008 @ 11pm

The non-identifiability of this dataset is indeed up for debate:


Posted by
Eszter
2 October 2008 @ 8am

I think it’s hard to imagine that some of this anonymity wouldn’t be breached with some of the participants in the sample. For one thing, some nationalities are only represented by one person. Another issue is that the particular list of majors makes it quite easy to guess which specific school was used to draw the sample. Put those two pieces of information together and I can imagine all sorts of identities becoming rather obvious to at least some people.


Posted by
Michael Zimmer
4 October 2008 @ 1pm

Eszter is right on both counts. I’ve already figured out the school based on the majors listed (click the link on my name to the left). It would have taken the researchers little effort to make the majors generic.

Do you know if the IRB records are publicly available? I’m very interested in how the research was presented and how the IRB ruled on issues like this.


[...] Facebook Datasets and Private Chrome, Fred Stutzman, Unit [...]


Leave a Comment

Reminder - Productivity Seminar this Friday Dr. Vance Ricks to visit Technologies of Friendship