A few days ago, I stumbled upon a post from the blog Business Insider that asked “Why Is Twitter More Popular With Black People Than White People?” Drawing on data from Edison Research, the writer proposed a number of explanations for why “black people represent 25% of Twitter users, roughly twice their share of the population in general.” This factoid has now been reported by the New York Times, the San Francisco Chronicle, The Atlantic, as well as a number of prominent blogs. It’s also going viral in the Twittersphere.
I’m loathe to trust bloggers getting survey data right, so I requested a copy of the report from Edison Research (available here). At first glance, the data looks good – the research was conducted by Arbitron, it employs a landline/mobile random digit dialing (RDD) frame, with about 1,750 people age 12 and older interviewed. “National probability” studies of this sort are generally considered valid for population estimates.
Without getting into too much detail, a study’s validity is dependent on the sampling method and sample size (among many other things). In terms of method, RDD is not a true equal-probability of selection method, but both industry and academia consider it “good enough” when the sample is weighted to known totals. As for size, a sample of 1750 people allows us to make claims about a large population at an error rate of about plus or minus 3 percent.
Let’s cut to the chase: Where did the Edison Research interpretation go wrong? In the report, Tom Webster states:
The percentage of Twitter users who are African-American currently stands at roughly 25%, which is approximately double the percentage of African-Americans in the current U.S. population. Indeed, many of the “trending topics” on Twitter on a typical day are reflective of African-American culture, memes and topics.
From this, we are to believe that of all Twitter users, 25% are African-American. Not only is this surprising considering current population estimates, but also because Twitter is a global service. Let’s explore how Edison got to this 25 percent number (conveniently rounded up from 24 percent).
In the phone interview, Edison asked all respondents 12+ (n=1750) if they “currently ever use[d] Twitter.” 7% of respondents said yes, approximately 123 people. Of those 123, Edison then asked how often they used Twitter. 85% of those respondents (105 people) indicated they used Twitter at least once a month, and were thus recoded as “Monthly Twitter Users.” Herein lies the problem: It was from these 105 individuals (not the 1750 total respondents) that Edison based its estimates of Twitter use.
Let’s return to sampling error. Because random samples are asymptotically efficient, a sample of 1750 can speak to a population of hundreds of millions almost as well as a sample of 2000, 3000, or even 5000. But a sample of 105 people speaking to the very large userbase (self reported at 100 million) of Twitter? Not so efficient. The margins of error are approximately +/- 10% at an alpha of .05, +/- 12.5 at an alpha of .01. And these margins assume true equal probability of selection, and no nonresponse bias. With weighting for proportionality, it is almost certain these margins increase substantially (1).
Let’s explore what this means practically. First, Edison Research can’t speak to all Twitter users, because all Twitter users weren’t potentially included in the sample. Edison can, however, speak to USA Twitter use, from its sample of 105 monthly users. If we assume that only 5 million Twitter users in the USA use the service every month, Edison is still using 105 people to speak about these 5 million people (the margins of error don’t change). Unfortunately, this is highly unreliable.
The American Community Survey finds that approximately 13.1% of the US population self identifies as Black or African American. At an alpha of .05, the range of potentially true estimates of African-American Twitter use in the US is actually anywhere from 14% to 34%. At an alpha of .01, this estimate ranges anywhere from 11% to almost 38%, causing us to reject the hypothesis that the estimate is not attributable to sampling error or random effects. If we then include weights in our estimates of error (likely the case because Edison’s sample over-represents people under 24), the growth in error causes us to fail to reject the null hypothesis at the .05 level as well. We just can’t trust that the demographics of Twitter actually do vary from current population estimates.
Is Twitter “disproportionately” African American, White, Hispanic, or Green? The simple fact is that from this data, we can’t say so with confidence. If Edison had been a little more forthcoming with their sample sizes, it might be more likely that the blogger/journalist who reported these data would have sensed something wrong. But I wouldn’t bank on it, because it seems like Edison Research was pushing this spin from the get-go.
A final note: as I was researching/considering this piece, it was interesting to see the “spin” being placed on this “fact” around the blogosphere. Of course, you had your standard racist comments/tweets of the “there goes the neighborhood” variety, but there also appeared to be a large swath of users who were heralding this as a point of pride. Before you examine my subconscious racist motives for examining this question, please just know I like getting surveys right. And if Edison wanted to get this right, they could start by giving us a topline cross-tab of ethnicity, Twitter use, and the respective margins of error.
Ugh, footnotes on a blog!
1. Research consistently demonstrates a negatively correlated relationship between age and nonresponse; young users are more likely to under-respond, increasing their odds of being weighted in a population (and increasing their margins of error). Research is mixed on the relationship between ethnicity and nonresponse.








Indeed, a bogus survey. As Twitter does not mask the identity of users, why not sample Twitter users?
You definitely like to get your surveys right. That’s a really nice job on getting the facts straight there. I looked at the graph before anything else, and I’m always very skeptical of statistics to begin. Whites and African Americans being more than three quarters of the percentage of Twitter users just don’t seem to add up. I think the main thing you can usually assume in racial statistics for America is that whites will be the dominate percentage, and then it will be either African American or Hispanic as the next largest percentage. And of course, like you are proving, you have to check how the study of these statistics were completed and the individuals.
Fred you are my academahero.
It does appear from other sources (not just from this particular small study) that black are highly represented on Twitter: http://www.quantcast.com/twitter.com#demographics But same with Facebook:
http://www.quantcast.com/facebook.com#demographics And same with Myspace: http://www.quantcast.com/myspace.com
The thing is, how much of that is simply because minorities have had higher birth rates and are thus “overrepresented” among the younger population.
great post, thanks!
[...] On Twitter and Ethnicity " Fred Stutzman (tags: controversy demographics race research socialmedia sociology statistics tracking trends twitter survey context diversity digitaldivide) [...]
I have to applaud for investigating that — but you are a bit harsh: the result demands to be investigated further (if anything, compare phone results with we-based sources of data to sort representation, adoption, engagement, if anything because relations between races are being redefined in the USA), but they are more than anecdotal evidence: teaching a culture of statistics also includes frowning the binary “before the magic threshold bad-bad-bad journo-spin; beyond, perfect scientifically valid result” attitude.
The Children ratio assumption is interesting, but shouldn’t we see far more Hispanics too? (European here, so I have no idea of the actual ratios vs. official, IT access and child ratios.) I’m not sure whether Latino stars have used twitter, or whether Hispanics use cell-phone, public libraries. At least, that’s a natural experiment that doesn’t demand more funds to investigate.
One important claim that I haven’t heard about (maybe I’m filtering too much): hourrah! non-geek celebrities, mobile-compatible UX and public libraries do help democratic empowerment though IT.
@Bertil – We’re not arguing over a p-value of .053 here, we’re talking about a survey of 100 people claiming to represent 100 million. Nothing you can say will make me trust these estimates.
And to that extent, let me be clear – I have no qualms with the general trend of the survey. The point of this exercise is to show demonstrate problems with the interpretation of poll results, and highlight some of the mechanics behind why the results are generally unconvincing. There’s lots of priors that point to differential access/use, and I don’t contest them.
[...] same topic. Fred Stutzman had some interesting things to say about this in his blog post ‘On Twitter and Ethnicity‘. Categories: Uncategorized Tags: ethnicity, twitter Comments (0) Trackbacks (0) [...]
Good points – ran across your blog after reading this article ‘For many blacks, Twitter enables a vibrant online life’ http://www.newsobserver.com/2010/05/15/483244/for-many-blacks-twitter-permits.html
Maybe a better sample size would’ve given different numbers?
[...] That survey has been widely cited as an explanation for the popularity of blacktags, but as Fred Stutzman later pointed out, Edison's survey had a high margin of error, and thus didn't really tell us much about how many [...]
i appreciate fred’s post and analysis of the study, but one has to ask what motivated the study to begin with and why it matters. it has the appearance of using “empirical data” to bolster divisive attitudes so pervasive in the united states