In January, I published a post that contained the findings of a study I did on Facebook. That post, entitled “Student Life on the Facebook” was widely read – it may be the reason you follow my blog today. Since then, I’ve posted other findings from my studies, in blog form. I’ve presented these findings in conferences and journals, and I’m currently writing them up into a comprehensive journal article.
Shortly after Student Life on the Facebook was posted, someone added it to Wikipedia’s entry about Facebook. A few times a day, someone clicked through from Wikipedia to my blog, and found my research on Facebook.
On August 16, the Wikpedia registered editor L1AM, leaving the comments “tweaking”, “cleanup” and “thining (sic) out links”, edited the Facebook Wikipedia entry, removing the link to my research. On that day, L1AM removed over 25 links to a variety of sources, from personal weblogs to mainstream media sources. L1AM also removed a number of paragraphs from the entry, contributing to a one-day edit in which 15% of the article was removed [1].
As Wikipedia was a very small fraction of my traffic, I didn’t notice that Wikipedia wasn’t linking to me until a few days ago. When I checked the entry and saw the edits, I was frankly surprised. While a number of blog entries remained, including a humorous piece on Facebook Etiquette by CollegeHumor.com, my work was deleted. No longer would people researching the Facebook via Wikipedia stumble on to my research (and publications).
While L1AM did not cite a particular reason for the deletion, I wanted to explore the potential rationale behind his decision. My particular case may be somewhat unique, but the notion of impartial academics posting real research data to blogs is hardly novel. Consider how many times a day a blog post passes through your newsreader with empirical statistical data – this shows that blogs are becoming a method of research dissemination.
Wikipedia presents a clear set of guidelines as to what is considered a reliable source. As you might imagine, this covers a wide variety of source areas; since we’re concentrating on blogs, here is the blog (self-published source) policy:
A self-published source is a published source that has not been subject to any form of independent fact-checking, or where no one stands between the writer and the act of publication. It includes personal websites, and books published by vanity presses. Anyone can create a website or pay to have a book published, and then claim to be an expert in a certain field. For that reason, self-published books, personal websites, and blogs are largely not acceptable as sources.
Exceptions to this may occur when well-known professional researchers self-publish within their fields of expertise or when well-known professional journalists publish their own material. In some cases, these may be acceptable as sources, so long as their work has been previously printed in credible third-party publications and they are writing under their own names and not under pseudonyms.
However, editors should exercise caution for two reasons: first, if the information on the professional researcher’s blog (or self-published equivalent) is really worth reporting, someone else will have done so; secondly, the information has been self-published, which means it has not been subject to any independent form of fact-checking.
In general it is preferable to wait until other sources have had time to review or comment on self-published sources.
Reports by anonymous individuals, or those without a track record of publication to judge their reliability, do not warrant citation at all, until such time as it is clear that the report has gained cachet, in which case it can be noted as a POV.
I’ll parse this a little. Blogs are generally not considered an acceptable source for Wikipedia entries, unless:
- The blog is written by a well-known, professional researcher writing within his or her filed of expertise, as long as their work (though not the work in question) has been previously published by credible, third-party publications.
- The blog is written under the researchers own name, and not a psuedonym
- Other sources have reviewed or commented on the blog topic
Wikipedia’s policy is broad and general, as it should be – but the generality presents difficult confounds, especially for early-stage academics. The initial test of well-known – what does this mean in academia? And who is the judge of this? Furthermore, what is the notion of peer-review in the blogosphere? This is a critical question. Every day, hundreds of people receive my posts in their feedreaders, and it is only once in a while someone calls me out as a fool. Since I am not being called out, have I passed peer review? Or is peer-review operationalized only when a blogger with a higher Technorati rank links to me?
I feel it is critical to at least ponder these issues because the fact stands that young academics will be blogging research in the future. They will do it to share early-stage findings, to find new colleagues, to expand their networks – at the same time, sharing valuable, usable data. Will Wikipedia turn a dark eye to this growing corpus of valuable research?
Of course, my article wasn’t deleted because it was blog research. The editor L1AM left other blogs, which suggests his purge was oriented towards stuff he didn’t like, get, or feel was worthwhile. Looking at L1AM’s edits, it is remarkable how much of Facebook’s history was edited out that day. If you’ll allow that my research was valuable to the Wikipedia entry, we can clearly see a flaw in this editing process.
The flaws I see are twofold. First, and foremost, it is poor editorship. L1AM decided to hack up the Facebook article one day, he didn’t leave much in terms of justification – he just did it. According to Wikipedia policies and philosophies, the magnitude of change he made should have been accompanied by a good deal of discussion, commenting, and documentation. None of this occurred, which would seem to be poor editorship.
The second flaw is a little more nuanced, and it deals with how the community regulates the editing process. Wikipedia articles grow over time. I believe that for a majority of articles, the majority of content is added up front, with new content adds and deletes spread out over time, creating the logarithmic long-tail we’re used to seeing. One can think of this process as a honing in, or “getting right” – edits should become smaller and smaller as time progresses.
When L1AM removed 15% of the Facebook article, he did a massive scale edit. The edits clustered around L1AM’s edits were much, much smaller – single word or line changes. If we are to assume that the 15% of the article that L1AM removed had been vetted by the community, how would the community would respond to this significant loss? As it happens, the community simply went on, and L1AM’s changes were not challenged. This leads to my question – is there a quantitative metric we can place to determine failures in Wikipedia’s process? If the editorial process allows 15% deletions on strong, established articles, does this mean the editorial process has failed? Shouldn’t edits be getting smaller over time?
Of course, this is interesting to think about in the wake of Citizendium. While I’m spending good wage-earning years of my life being poor to get a degree that says I’m an expert, I’m definitely one that believes in the validity of crowd work. Terrell Russell’s has a good middle ground on the expertise in Wikipedia issue – though nothing short of the Citizendium solution would deal with the breakdown in the editing process observed on the Facebook article.
Of course, there are a couple things to note. The Facebook article might be an outlier on Wikipedia due to its popularity. The fluctuations and lack of editorial control may simply be from the fact that popular articles break Wikipedia’s model – as opposed to the model being inherently flawed. The communities around less popular articles may demonstrate more of the classic, long-tail characteristics. I’d love to see some data on this.
Getting back to the original topic at hand, I’d also love to see how many blog citations there are on Wikipedia. I can’t think of any good way to extract a rough ballpark. I would assume that blog appearance on Wikipedia clusters around emergent phenomena – the Facebook being a prime example.
As we go forward, many more of us will be sharing valuable, good research on a variety of topics via our blogs. Sure, that research will eventually go to print and die a lonely death inside a non-accessible digital library – but should Wikipedia keep its head in the sand to the research until that point? While traditional scholars may argue one side, the realities of information needs and flows, especially about emergent phenomena, may create an imperative for posting early research to blogs. Will Wikipedia change its policies to adapt accordingly?
[1] Based on pre- and post-edit line counts of content in Wikipedia Facebook article. I’m not sure this is a great metric, so I am open to better suggestions.