Thoughts


4
Nov 09

SETI Interview

This week, Peggy Orenstein and I appeared on SETI‘s (yes, that SETI!) public radio program to talk about Freedom.  Other guests on the program include Ray Kurzweil and Stephen Wolfram.

Link to the show (direct MP3 download)

The segment on Freedom is about 30 minutes in.  I enjoyed hearing Peggy talk about her work on the article, and her use of Freedom.


26
Oct 09

NY Times Magazine

Sunday’s New York Times Magazine (!!!) featured an article by Peggy Orenstein on the virtues of Freedom.  She writes:

Not long ago, I started an experiment in self-binding: intentionally creating an obstacle to behavior I was helpless to control, much the way Ulysses lashed himself to his ship’s mast to avoid succumbing to the Sirens’ song.

And that is why I need the mast. It came in the form of an app called Freedom, which blocks your Internet access for up to eight hours at a stretch. The only way to get back online is to reboot your computer, which — though not as foolproof as, say, removing the modem entirely and overnighting it to yourself (another strategy I’ve contemplated) — is cumbersome and humiliating enough to be an effective deterrent. The program was developed by Fred Stutzman, a graduate student in information and library science, whose own failsafe self-binding technique — writing at a cafe without Internet access — came undone when the place went wireless. “We’re moving toward this era where we’ll never be able to escape from the cloud,” he told me. “I realized the only way to fight back was at an individual, personal level.”

Orenstein goes on to write:

It could be that sometimes our greatest freedom may be to choose freedom from freedom. I am still surprised by the relief that floods me whenever I bind myself from going online, when I have no option but to ignore the incessant tweets and e-mail messages and videos and news links and even the legitimate research.

I’m not wishing the Internet away. It has become so integral to my work — to my life — that I honestly can’t recall what I did without it. But it has allowed us to reflexively indulge every passing interest, to expect answers to every fleeting question, to believe that if we search long enough, surf a little further, we can hit the dry land of knowing “everything that happens” and that such knowledge is both possible and desirable. In the end, though, there is just more sea, and as alluring as we can find the perpetual pursuit of little thoughts, the net result may only be to prevent us from forming the big ones.

First things first, it is a tremendous honor (an a little surreal) to be featured prominently in the New York Times Magazine.  My goal in designing software is to solve complex problems through simple design, and it is heartening to know that I’ve helped people become more productive and accomplish their goals.

The point Orenstein makes is that our inundation with little-k knowledge – extensively afforded by the Internet – stands in the way of production big-K knowledge – our books, dissertations, and large-scale projects.  By stepping away and freeing one’s self from the stream – by finding freedom in Freedom – we are able to focus anew on Knowledge.

The privileging of Knowledge over knowledge is essentially a value-enforcing process (see Foucault’s lecture Truth and Power or the long-form Archaeology of Knowledge), one that is troubling as we privilege certain forms of literacy and marginalize others (e.g. Wikipedia, digital literacies, etc).  But there remains a simple fact that many of us have got to “get stuff done” – and many a longer-form project has been distracted and derailed by YouTube, Facebook, and other smaller knowledges.

The article draws on one of my theories regarding productivity and machines of work.  The history of machines is dominated by unitary, task-focused devices (see Cohen’s Social History of American Technology for a review of discourses surrounding early technologies).  Even though industrial devices were technically collaborative, the focus of use was primarily individual and task-focused.  Fast forwarding to the creation of knowledge industries, we see a long lineage of restriction and task-focusing (i.e. working on computer systems with limited programs, no access to the internet).

Following Latour’s interpretation of Machines in Science in Action, the meaning we attribute to devices is socially constructed and situated.  The problem we face with “computers” is their many “constructions” and “situations.”  Our computers exist as boundaries between work and personal culture, hot and cold media, work and enjoyment, social and contractual obligation.  Our expectation that the computer remain a device of work is a discursive construction, and our perspective that it is a failing to be distracted by computers is a statement of values.

So why does Freedom work?  At a very practical level, yes, it removes the distraction of others, YouTube, Facebook, and so on.  But it also reshapes the device, reconfiguring our expectations of the device.  Is a computer running Freedom still a computer?  And that, I believe, is the power.  When the expectations are reconfigured and the device is reappropriated, we can approach it on new terms.


15
Oct 09

AOIR Wrapup

I spent the majority of last week in Milwaukee, WI attending the 10th annual Association of Internet Researchers conference.  This was my first time attending AOIR, and it was a great experience.  As an interdisciplinary researcher, I enjoyed seeing the diversity of methods and theories being applied to internet research.  Congratulations to the organizers for running an excellent conference.  IR11 will be in Sweden, and IR12 will be in Seattle, WA – hosted by UW’s iSchool.

I was busy at AOIR, attending the doctoral colloquium (my last as a student), and giving two talks.  The first talk was a paper, entitled “Boundary Regulation in Social Media.”  This is work Woody Hartzog and I collaborated on, in which we interviewed people who maintain multiple, separate profiles on social network sites.  We were interested in learning about how these folks use multiple profiles as a boundary regulation strategy.  This research highlights a number of issues many of us are experiencing with context merging – as our friends, family members, coworkers and past friends merge into a single social network.  I’ve posted the slides below:

The second talk was out of some research-in-progress exploring the adoption of social networks by older users.  Over the past year, the largest growth sector in social networking has been the 35+ demograpic.  In our research, we talked to people in their 40′s, 50′s, 60′s and 70′s about their use of social network sites.  This was fascinating research, and learning about some of the challenges of reconnection via SNS after 30, 40 or even 50 years was particularly interesting.  As I mentioned, this is research in progress – which I’m conducting with Cheryl Thompson and Valeda Stull – and it involves a mix of methods and components.  Our next step is to implement a survey that builds on our findings.  You can see the slides below:

As for general themes for the conference, I admit I stuck pretty close to the social media/networking/privacy talks, but some general observations:  First, context is big.  Lots of researchers are looking at the effect of context on disclosure behavior.  Context has big practical implications for social networking sites, and both of my qualitative studies reveal that the big sites have lots of work to do to address these needs.

Second, mass reconnection is an interesting byproduct of the “democratization” of social networks (as Amanda Lenhart described.)  Reconnection was absolutely driving the use of social networks among the older users we interviewed.  Before SNS, reconnection was a costly and inefficient process; I think we can argue that the particular affordances of SNS (search, articulated networks) facilitate reconnection, and that reconnection is going to drive use of SNS for some time.  Of course, we must remember that the SNS is just the medium for reconnection, and that the “infrastructure” of reconnection relies on increasing broadband adoption, cheaper computers, and increasing technical literacy.

This goes without saying, but Twitter was a hot topic at AOIR.  I particularly enjoyed the work of danah boyd and Alice Marwick, who conducted a series of studies on Twitter this summer at Microsoft Research.  Alice’s talk was focused on the production of celebrity and microcelebrity in Twitter, and was just fascinating.  danah expanded this research, expanding how “teens are Twittering.”  Indeed, teens are a marginal segment of Twitter, with sparse networks, but what was particularly interesting is how they were using celebrity to create vectors for conversation, bridging networks and building relationships.  Very impressive research from both.

Of course, there were lots of great presentations at AOIR, too many to mention here.  I would like to particularly thank Heather Attig, Amanda Lenhart and Sarita Yardi, who were my co-panelists on the Late Adopters panel.  I really liked how this panel came together, bridging a variety of research questions and methods to provide insight into this phenomenon.  I’ll leave you with the slides from Amanda’s presentation, which provide some brand new topline data on Adult SNS use (47%).

Oh yes, if you want copies of the papers, I’ll be happy to email them to you. I’m not posting them right now because some are either under review, or being revised for journal submission. Just drop me a line.


5
Aug 09

Teens Don’t Tweet, or, How to Read a Web Panel

In the past few months, we’ve seen a number of studies of dubious methodology make sweeping generalization about Twitter.  Examples include the Twitter gender study; another study asserted that because only one out of every five teens tweet, that teens don’t use the service (wouldn’t you like one out of every five teens to use your product?).   Nielsen joins this discussion by stating that “Teens Don’t Tweet”, based on  findings from their online panel.  They assert that “In June 2009, only 16 percent of Twitter.com website users were under the age of 25. Bear in mind persons under 25 make up nearly one quarter of the active US Internet universe, which means that Twitter.com effectively under-indexes on the youth market by 36 percent.”  Oh noes!

twitter_by_age

As these data will undoubtedly be reported breathlessly elsewhere, I thought it might be useful to step back and explore some of the issues with the methodology and conclusions.  So first, a note about the methodology.  The Nielsen NetView panel contains an impressive 250,000 users.  Metering software located on client machines records the websites visited by panel members.  A vast majority of the panel is recruited online; the panel is “calibrated” (weighted) against gold-standard sampling methods (Random Digit Dialing, etc.).

Survey weighting is a standard, fairly uncontroversial process.  It is commonly used and is thought of as preferable to census-type approaches that may systematically under-represent some populations.  However, reliable survey weighting gets tricky when the population is small.  Since teens are a notoriously hard-to-reach population, we generally see inflated standard errors around weighted teen respondents in a population survey.  Nielsen does not report standard errors, and the makeup of their panel is confidential, so therefore it is impossible to know how much error there is around the estimate of use.  If the panel is like other panels, though, there may be more error in young people than a high-response population, such as adults.  We’re very familiar with margins of error (the things you see in political polls, where error is reported as plus or minus 3 percent, etc).  An inflated error means the margin is larger, meaning that the estimate may vary by a larger amount.

This is not to put down Nielsen.  With 250,000 members, the panel likely has good coverage of young people.  Since my purpose is to use this example to critique web panels, we must point out two other issues.  First, bigger is not necessarily better if the sample is convenience driven.  Nielsen’s panel is very large, but simply because it is large doesn’t mean it is representative.  In fact, Nielsen is likely more interested in the larger size for better sparse-market coverage, as opposed to statistical reliability.  Second, the particular nature of recruitment into the main panel introduces selection bias.  If people aren’t selected randomly, then there may be characteristics of the population that covary with the variables of interest.  This is an omni-present issue with polling, but it must be noted.

So when we read a web poll of this particular nature, what are the critical questions we should be asking?  First, we should be concerned about cell size (the number of respondents) for a hard-to-reach population.  If young users are underrepresented, the standard errors on the estimates can be quite large (which may push an estimate around by +/- 10 percent).  We should also question the method of recruitment; if the majority of the panel comes in via the web, then who gets left out?  Since this poll is designed to represent online users, it is seems likely that heavy web users are participants (my guess).  But what if Twitter users actually aren’t like heavy web users?  There are a whole host of other questions we should ask regarding polls (response rate, sampling frame, etc) that generally aren’t answered in online polls.

It is important to understand the potential methodological issues when reading research.  Nielsen’s methods are standard for the industry, and they acknowledge the drawbacks and limitations.  In my opinion, the major problem isn’t the methods component, it is Nielsen’s spinning/presentation of its results.  In the Nielsen study (and the previous Participatory Media Network study), the findings focus on lack of teen use of Twitter.  However, the findings reported by Nielsen cover the following age ranges: 2-24, 25-54 and 55+.  The critical category, 2-24, covers a wide range of users – incredibly young children, adolescents, teens and adults.  The grand mean reported by Nielsen is affected by variation inside the different age categories.  Using census data, we can look at age breakdown over the ranges 2-24.  According to census, there are 80MM Americans under age 24 (0-24).  There are approximately 15-16MM Americans in the age ranges 0-4, 5-9, 10-14, 15-19, and 20-24.  Therefore, each category is pretty much weighed equally.  So lets (hypothetically) assume that no one age 0-9 uses Twitter, 5% of people age 10-14, 35% of people age 15-19, and 40% of people age 20-24 use Twitter.  To calculate the grand mean we would weight the percentages and then sum.  Such a formulation would give us 16% use for the demographic age 0-24 (0+0+.01+.07+.08).

The second problem with Nielsen’s presentation is the comparison range.  Comparing the age ranges 2-24 and 25-54 is not fair on a number of levels.  The first category can really only meaningfully cover age 13-24, while the 25-54 age range meaningfully covers 30 years.  If we weigh the estimates (16%/64%) by volume coverage (1:3 ratio), then the category volume for older users would be ~21% (I didn’t bother to weigh by census, just an estimate).  And what if we compared just teens against an adult category – we might even find that teens Twitter more than adults.  Keep in mind, with all the advantages afforded to older users (no Internet restrictions, etc) there are major differences between older users and teen/young users in their capacity to partake in online community.

My analysis is simplistic and speculative, but in certain configurations, “young people” could plausibly use Twitter at higher rates than “adults.”  I don’t have a guess regarding what is right, but my gut tells me that if Nielsen was more upfront regarding their sampling, and less misleading with their infographics, we’d have a different story.  And that story would not be as catchy and headline-grabbing as “Teens Don’t Tweet.”


23
Jul 09

Newsweek on Facebook@5

Fbnw Newsweek has a special section on Facebook at 5 years, featuring a number of interesting articles and videos.   Nicole Ellison and I are interviewed for the lead piece.

To date, no one has come up with a reliable algorithm for “coolness,” (which might explain why Facebook fought the buzz behind Twitter by, um, copying it). But for academics like Stutzman and others increasingly turning their attention to social networks, there’s a name for what happens when everyone joins the same site at the same time, perhaps rendering it uncool: “context collapse.” That’s the term used to describe a series of awkward events like when your boss or parents friend you, or someone posts a picture of you that you don’t want your colleagues seeing, or when an elementary school bully from your past starts commenting on your status updates. As these activities cascade, social media research has shown that people begin to shy away from their online persona and begin aggressively limiting the information that appears about themselves. Not surprisingly, users begin to stress out about their tangled social scenes and abandon the network all together. “What needs to happen—and what’s going to happen—is that there needs to be more granular privacy settings,” says Nicole Ellison, who researches and teaches on social media at Michigan State University. “So I can share a status update, but one I only want to go to my high school friends.”

As a side note, this has to be one of the fastest primary-research-to-Newsweek jumps ever.  We were interviewing dual-boundary maintainers in the morning, talking to Newsweek in the evening, and it was on the web the next day.  As a side note, we’re very close to wrapping up our dual-boundary research, and it has been fascinating.  Looking forward to presenting our conference paper at AOIR.


24
Jun 09

The Great Wall of Facebook

Fred Vogelstein has an interesting article in the new edition of Wired, previewing Facebook’s full-on assault of Google for targeted advertising territory.  The article makes news, and includes some great (and painfully ironic quotes) from Mark Zuckerberg in which he accuses Google of contributing to the surveillance society (Pot, Kettle, Black).  The article reads like a preview for the Super Bowl, with notoriously tight-lipped executives tossing bombs back and forth.  Congrats to Vogelstein for successfully stoking the ire of these monoliths.

The fundamental conflict of the article lies in the comparison of the advertising products offered by the two companies.  Google’s product, targeted text ads, is the single most successful product on the Internet.  The tiny, unobstructive ads have fueled Google’s dominance in multiple markets; today, 90% of Google’s revenue comes from Adsense.  Facebook’s product is nascent – it is the concept that advertising works better when it is socially mediated.  That is, we are more likely to click on ads, content, and links when the content is funneled through our friends.  This theory is sensible, but to date, Facebook’s concept remains vaporware, with a majority of their revenue coming through traditional targeted text and banner campaigns.

Framed by Zuckerberg, the contrast between Facebook and Google is personal vs. impersonal.  Of Google he states: “You have a bunch of machines and algorithms going out and crawling the Web and bringing information back.  That only gets stuff that is publicly available to everyone. And it doesn’t give people the control that they need to be really comfortable.”  Vogelstein writes:

Facebook CEO Mark Zuckerberg envisions a more personalized, humanized Web, where our network of friends, colleagues, peers, and family is our primary source of information, just as it is offline. In Zuckerberg’s vision, users will query this “social graph” to find a doctor, the best camera, or someone to hire—rather than tapping the cold mathematics of a Google search. It is a complete rethinking of how we navigate the online world, one that places Facebook right at the center. In other words, right where Google is now.

Personal vs. impersonal.  Wouldn’t you rather get a doctor recommendation from ten of your friends than a text link?  The value of peer recommendations have driven many communities, including countless bulletin boards and fora, sites like epinions and Yelp, and members-only specialist communities.  The fundamental problem with monetization in Facebook’s case lies with norms that govern the exchange of advice, particularly that the advice be truthful and unbiased.  If we are to trust advice, we must know that external agents aren’t corrupting or influencing the transmission of advice.  We can get advice from Facebook regrading doctors, but we won’t trust the advice if Facebook pays our friends to recommend certain doctors.

Facebook’s grand vision involves a wholly-contained world of social information that is brokered out through the web.  With enough critical mass, it is argued, most of our common information needs can be answered by our social networks.  With most technological main effect hypotheses, the formulation is generally suspect.  Researchers of social support argue that support is more effectively derived from certain actors, that support is contextual, etc.  In a traditional model, where the people around you are the primary producers of information, your personal support network is crucial.  With the advent of the Internet, however, most of us no longer exist in a traditional model where the people around us are our only support vector (1).

The reality is that Google, and other search engines, have restructured expectations regarding everyday information seeking.  It is no longer good enough to simply get recommendations from a personal network when there is a vast quantity of electronic information available at one’s fingertips.  You can certainly get doctor recommendations from your friends, but the online search for information about the doctor is now a natural part of the information seeking process.  In this sense, Facebook is complementary, providing an important but not all-encompassing factor in our decision making process.  The argument that individuals will move their information seeking to a social network, and away from the mechanistic site Google simply assumes too much.  Google has already won by making itself an integral part of our everyday information seeking processes.

If Facebook (a proxy for “socially mediated search”) is a complementary and useful part of everyday information seeking, we must consider the relevance of information we get from the site.  We generally assess relevance in information systems through “recall” and “precision.”  In Facebook, recall is strictly bound to our known social world – the people who we have connected with.  Therefore, precision is a function of how well the various others producing results match our needs.  If you have 500 friends, spaced across a variety of age ranges, is it safe to assume that information you get from the network will actually be all that relevant?  Our core social networks are generally homophilous, but our core social networks are very small.  Expand past a certain network size and it becomes likely the interests and experience of your “friends” will vary significantly from yours.

Facebook could address this problem with friend lists, the privacy feature that compels individuals to place their friends in groups.  Perhaps friend lists could be converted to interest groups (People whose book recommendations I trust), but the mechanics of a process would require a good bit of intervention on behalf of the user.  The participation gap is also problematic – if the people who you really trust for book recommendations are not heavy users of Facebook, then it is unlikely you’ll have your information needs addressed.

Facebook could develop algorithms that look for similarity between question askers and answerers – if I ask for a book recommendation, perhaps Facebook could weight responses from people who share my stated book tastes.  This compels participation and broadcast of information, one of Michael Zimmer’s new laws of social networking.

Although the debate framed by Vogelstein and Zuckerberg is Facebook vs. Google, there is actually very little opportunity for Facebook to significantly edge into Google’s core market – targeted text-link ads.  Text link ads are served as a by-product of information search, which is an integral part of our everyday information seeking processes.  Facebook is likely to emerge as a complement to search, and in some areas it may perform better than search, but search will remain relevant.  The challenge to Facebook is to find a way to monetize their value areas without being in contravention of social norms.  The challenge to Google is to get access to the wealth of personal data Facebook is collecting (and no, Google Friend Connect and all of their other terrifically lame social products, will solve this problem).  For the consumer, the battle between Google and Facebook is a win-win, with the obvious exception of privacy matters.

(1) Those with “impoverished life-worlds” – those with limited access to information and resources, are unlikely to incorporate search engines or social networks into their everyday information search processes.


4
Jun 09

Rethinking Twitter and Gender Differences

On Monday, the Harvard Business school posted a “conversation starter” study on gender differences in Twitter use.  The authors found that “men have 15% more followers than women” and “an average man is almost twice as likely to follow another man than a woman.”  The authors suggest, without empirical data, that men find the content produced by women less compelling “because of a lack of photo sharing.”  Is everyone else offended by this base characterization?

As it happens, the study has serious flaws.  I’d like to point those out, and then suggest an alternative method for addressing these questions.  Let’s start by talking methods.  This study is a survey; using a random sample of 300,000 Twitter users, the authors attempt to draw population-level inferences about “friending” behavior in Twitter.

When conducting a population survey, researchers collect a sample and attempt to use that sample to draw inferences about a population.  The difference between the “sampled” population value and the “true” population value is known as error.  Survey error (MSE) has two components: sampling error and non-sampling error.  We are most familiar with sampling error; it is the differences between the “sample” value and the “true” value attributable to the sample selection.  Non-sampling error comprises all other error non-attributable to sampling error, such as data entry error, instrument error, etc.

For the purpose of this analysis, we are going to focus primarily on sampling error.  At the study sample size of 300,000, there is very little sampling error in an infinite population.  While we generally associate a large sample size with better quality data because of this small sampling error, there are two caveats.  First, above a certain sample size, say 20,000, there is little marginal gain in the addition of sample.  The difference between an n of 500 and an n of 1000 is vast, but the difference between an n of 20,000 and an n of 40,000 is much smaller due to the properties of the normal distribution.

On paper, a larger n is always better; here is the second caveat.  When dealing with very large samples, confidence intervals used to determine significance are smaller – meaning even the most minute differences become “significant.”  Furthermore, discovery of influential data is more difficult, as those data may be sufficient in number (i.e., a pattern emerges in influential data) to influence the distribution. As any Twitter user with a public profile knows, there are certainly some “patterns” that emerge in follower behavior.

Let us revisit the purpose of the survey, which is to use a sample to draw inferences about a population with as little total error as possible.  The goal is to not achieve significant differences on wild hypotheses, it is to collect good data that represents a population.  To achieve this goal, survey designers expend a lot of effort understanding their populations, defining their sample, and working to achieve high data quality (while keeping costs under control).

Let’s say that I wanted to know the 2008 income of everyone over 18 born in my city.  So I go down to city hall, I ask for the names of everyone who was born in my city before 1991.  I then take this very large list, and cross-reference it with my magical 2008 tax records, and produce a wonderful study.  Can you spot some problems with the data?  At first, you might point out that not everyone over 18 born in my city earns an income.  Ok, that’s fine – I want to know that.  Now here’s the real problem: my city stared keeping records in 1830, meaning well over half of the people in my sample are dead, and they report no income.  Now I’ve got some highly influential data that actually looks “normal” due to attrition.

Let’s consider what we know about Twitter.  If we believe Nielsen, about 60% of people who create Twitter accounts abandon them within a month.  And if we believe the fair and balanced news organization Fox News, Twitter has a spam problem (Ok, anyone who has a public profile knows that).   What might these trends tell us about our population?  First, there will be a large cluster of inactive (attrition) users.  Second, there will likely be a large cluster of users who do not follow anyone, or follow a very small number of people (characteristic of attrited users).  Finally, since following is non-reciprocal, these attrited users (and active users) likely have their follower numbers inflated by Twitter spammers.

What do the HBS numbers tell us?  The authors find that the mean number of tweets/user is 26, but the median is 1 and 75th percentile is 4 tweets.  This indicates a highly non-normal distribution (it most likely approximates a bimodal distribution); that there are a large number of users with 0 or 1 tweets (50% of the sample – and 75% of the sample have less than 4 tweets).  This is indicative that a large portion of the sample is inactive.  (Of course, a number of these accounts could be “follower” accounts (i.e. people who do not post but follow), but I would argue this would constitute a small portion of the population). This provides good support for my first point.

My second point, non-follower data, is not addressed by the study.  They do not present information regarding the percentage of users who do not follow back, instead presenting an odds ratio that would hide the distribution of followers.  I would guess that at least 40% of the sample does not follow a user (or follows only “suggested” users).  My third point, that more people would be followed, seems to be upheld, as 80% of the sample has at least one follower.  There is likely some spam inflation there, and information about the distribution would tell us a lot.

As we can see, all signs point to low data quality, which casts all of the hypotheses and findings in serious doubt.  Just because a sample is large, and significance can be easily achieved, it doesn’t mean that data quality is good.  Unfortunately, it appears that the Harvard authors have made the error I describe in my income study – yes, they’ve collected a lot of people, but the failed to see who had died.  What good is an inference about a population if it is heavily influenced by bad data?  Don’t we actually want to know what real users are doing?

Beyond these data quality problems, there is also an issue with the gender classification; the authors rely on a corpus of names to predict gender of users.  As each name is a prediction, there is an error component associated with each name classification.  This error component must be taken into account as a function of the total variance component – meaning all of the things that looked significant may not actually be significant.

Since this is a “discussion,” I’d like to propose a method to re-run the study with better data quality (but larger standard errors).  The two main problems that will be addressed are compensation for attrition and gender classification.  To deal with Twitter attrition, let us first define it.  If we follow Neilsen’s numbers, a Twitter user that has posted at least once at >30 days and <30 days has a decent chance of being an active user.  We may want to make this criteria more lenient – perhaps just requiring one post in the last 30 days.  Either way, we must define a criteria to decide who is an active user (and this definition must be informed by data and theory).

The problem with gender is a little more difficult.  I don’t spend a lot of time in the TREC community so I’m not sure how good automated techniques are, so I’m going to propose human classification.  The most efficient way to do this is with Mechanical Turk.  Turkers could be shown a profile and asked to decide the gender of the profile owner; you’d repeat with a different rater to get an estimate of reliability.  Your guess is as good as mine about agreement – I’m generally skeptical of ethnicity ratings by third parties, but I tend to think that gender can be reasonably assessed.  Update: @yardi brings up a good point regarding brand/persona/promotional/shared accounts.  My (too simple) answer is exclusion.  If we’re truly interested in this gender question, then non-gendered accounts fit an a priori exclusion critera.  My gut instinct is that in a population sample, we would see low incidence of these accounts, and they could be analyzed separately to see how they would affect our data.

So the study would be simple – collect a first-stage sample of profiles and assess if they meet the activity criteria (this can be done automatically).  Then run a second stage random sample on the eligibles and send them to Mechanical Turk.  You could send 3000 profiles to MTurk and have them assessed, with a goal of ending up with 2400 profiles, giving you +/-2% at p<.05.  Of course, all of the “friend lists” would also have to be gender coded, so if you have an average of 10 friends you’re looking at 24,000 extra codings (minus overlap).  If we include overlap and say we’ll have 25,000 unique profile, and each profile has to be rated twice, at .01 a HIT we’re looking at a total price of 500.00.  Of course, if we pull our sample back we can reduce this cost substantially.

There are a couple of questions: First, we can’t really say how much better humans will preform at gender-coding until we run a comparison to the machine-coded results.  My gut is that humans will preform at a higher level of accuracy, but there is still a variance component with the classification.  We also don’t know what kind of bias we introduce by cutting out “follower” profiles.  I don’t know how many of these unique profiles would show up in a population survey, but it is an open question.  And what about the findings, how would they change?  My gut is that a lot of these “stunning” findings would go away, and we’d see greater gender homogeny in “following” behavior.  “Follower” behavior would still be influenced by spam, so it might be useful to assign a spam attribute to profiles to be used as a covariate (you could have MTers code them, run them through spamassassin to get a naive score, or simply use standard techniques to find influential data).

The important takeaways from this discussion is that “bigger” is not always better with social data, that data should be looked at critically before running analysis (using existing information and theory), and data that wildly contravenes existing findings should always be re-run to produce robust estimates.

Final note: The authors state that “On a typical online social network, most of the activity is focused around women – men follow content produced by women they do and do not know, and women follow content produced by women they know.”  Mike Thelwall’s (2008) large scale analysis of Myspace friending behaviors found that while females tend to friend females, there was not a significant gender effect for males.  In Mayer and Puller’s (2008) analysis of Facebook, they found that same gender was a significant predictor of friendship (in a potentially overfitted model).   Overall, studies commonly find gender differences regarding SNS/internet use; females are generally found to use communicative tools with greater intensity. (e.g. Joinson, 2008; Lenhart & Madden, 2007; Jones et al., 2009)

References:

Joinson, A. N.  (2008).  Looking at, looking up or keeping up with people?: motives and use of facebook.  In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, New York, NY, USA, 2008 (pp. 1027-1036).  ACM.

Jones, S., Johnson-Yale, C., Millermaier, S., and Perez, F. S.  (2009).  U.S. College Students’ Internet Use: Race, Gender and Digital Divides.  Journal of Computer-Mediated Communication, 14(2), 244-264.

Lenhart, A. and Madden, M.  (April 18, 2007).  Teens, Privacy and Online Social Networks: How teens manage their online identities and personal information in the age of MySpace.  Pew Internet and American Life Project.  Retrieved March 9, 2008 from http://www.pewinternet.org/PPF/r/211/report_display.asp.

Mayer, A. and Puller, S. L.  (2008).  The old boy (and girl) network: Social network formation on university campuses.  Journal of Public Economics, 92(1-2), 329-347.

Thelwall, M.  (2008).  Social networks, gender and friending: An analysis of MySpace member profiles.  Journal of the American Society for Information Science and Technology, 59(8):1321–1330