If you recall, a few years ago Facebook forced all users to select a gender if they wanted to continue using the site. This move generated a little controversy – some individuals didn’t feel comfortable with sharing the information, or fitting into a gender classification. Facebook responded:
However, we’ve gotten feedback from translators and users in other countries that translations wind up being too confusing when people have not specified a sex on their profiles. People who haven’t selected what sex they are frequently get defaulted to the wrong sex entirely in Mini-Feed stories. For this reason, we’ve decided to request that all Facebook users fill out this information on their profile.
Just today, I discovered (via the R Bloggers news feed) an video on the use of R in corporations like Google and Facebook. The representative of the Facebook data team talked about some exploratory data analysis they did in 2007. The finding? “If a user comes on more than once and is willing to give Facebook a very basic piece of information – their gender – that seems to be the strongest predictor of whether they will stay on the site.”
I’m not looking to stir up any controversy. Rather, I think it is an interesting example of analytics-based development, of research informing design. Of course, the challenge of translating research into practice is immense. Are there critical differences between individuals that share gender and those that don’t? Did a forced gender-selection process invalidate the predictive model? Was the controversy over gender selection worth the predicted benefit? Perhaps Facebook’s 500 million users owe more to gender selection than we can imagine.
Anyway, the video has some age on it, but I did enjoy hearing about Facebook’s use of R (the other analytic examples provided are cited in the “Maintained Relationships on Facebook” report, plus there are a few ICWSM papers, I believe). You can find the full video here (doesn’t look like embed is supported).
Update: Please see the response from Cameron Marlow, Facebook Data Team lead, in the comments. Cameron provides great context for this finding.