Facebook Datasets and Private Chrome
Via the Berkman Center, news of a Facebook dataset now available to the general public. I haven’t written up the necessary research statement to access the data, but the publicly-available codebook provides insight into the set. According to the codebook, the data is scrubbed, with personally-identifying data removed.
The “non-identifiability” of such a dataset is up for debate. A friend network can be thought of as a fingerprint; it is likely that no two networks will be exactly similar, meaning individuals may be able to be identified in the dataset post-hoc (for friend-network verification, see Zinman & Donath, 2007). Further, the authors of the dataset plan to release student “Favorite” data in 2011, which will provide further information that may lead to identification. According to the authors, the collection of the dataset was approved by the IRB, Facebook and the individual college. The dissemination of the dataset appears to be approved by the IRB.
In other news:
danah boyd recently gave a talk, “Understanding Socio-Technical Phenomena in a Web2.0 Era” at the opening symposium for MSR New England. The video is available in a WMV stream.
Via W.H., Iron - a version of Google Chrome with all Google reporting stripped out. In theory, this will also prevent the auto-update functionality, one I was never comfortable with.
Citation:
Zinman, A. and Donath, J. (2007). Is Britney Spears spam?. In Fourth Conference on Email and Anti-Spam, Mountain View, CA, 2007.

6 Comments