Posts Tagged: information


25
Aug 06

Natural A-Lists, or How Digg is Like the Blogosphere

The news website Digg is an extremely popular Web 2.0 application, rivaling the more “traditional” news outlet Slashdot. Digg, as opposed to Slashdot and most other sites, operates in an editor-less fashion; stories are submitted to be Dugg, and stories that are Dugg often enough are promoted to the front page. A story sent to the front page of Digg gets a remarkable amount of traffic, and the story submitter gets karma for having a story promoted. While Digg’s model, in its simplicity, is less than revolutionary, the open, egalitarian approach to a community news site has proven attractive to many.

From time to time, a story will get promoted to Digg’s front page lamenting the “downfall of Digg.” The general complaint is that Digg is no longer egalitarian, and that cartels of power users control what is raised to the front page of Digg. The power users, it is argued, bond together to jointly Digg each others stories, and they cyclically enjoy the karma and traffic provided by their success in promotion.

Without a question, this occurs. Diggers do band together and form cartels, somewhat limiting access to story promotion. However, what if this behavior was purely a function of the network, and not something more sinister?

As I’ve previously explored, the problem with the blogosphere is discovery. With 65 million blogs out there, it is impossible to sift through them all to find good content. As a result, we rely on natural screens that emerge in the network – or, we rely on those we know. For example, there are probably 1000 blogs out there just like Steve Rubel’s Micropersuasion. However, the reason Rubel gets traffic and A-list status is because of our lack of initiative in discovery. We look around and see Rubel’s blog linked frequently, and listed on the sidebars of blogs we trust – this ‘link capital’ increases our likelihood to start reading that particular blog.

Indeed, we probably could go out and use a discovery process to find blogs like Rubel’s, but why spend the time? In addition, the shared conversation that can be had between two readers of Rubel’s blog is valuable – almost as valuable as if we all were getting the theoretically best content at all times (assuming we ‘discovered’ all blogs like Rubel’s).

However, our laziness and unwillingness to filter all blogs to find the best content is only half of what it takes to create an A-List. The second half of the equation is the fact that A-listers are just like us. A-List bloggers don’t spend all day going through all blogs to find the best content. As Rubel wrote in his Underground Blogosphere piece, it is evident that many bloggers do this for him – filtering up links so it appears that he actually spends all day surfing cool websites. Not the case at all! A-List bloggers operate just like we do – so their linking behavior mirrors ours.

Hence, the A-List is naturally occurring. We only have so much time to process content, and the sheer volume of content, means we, nobodys and A-listers alike, have to rely on the natural hierarchies born in the network. Indeed, the A-list exists because of our inability to cope with the size of the blogosphere – not because of any evil cartels.

Back to Digg, however, we see the same thing occurring. Thousands of stories are submitted to Digg each day, more than any one person could read. As a result, digg users rely on coping mechanisms to deal with the volume of stories submitted. This coping mechanism is the establishment of friend parings in the network. When you friend people in Digg, they immediately act as a content filter for you. Digg is very much like the blogosphere in that you friend your friends (the people you know) and celebrities (Kevin Rose, Digg A-List). Look at the sidebar of your blog…if you’re a traditional blogger, you’ve got some links to people you know, and some A-list blogs you read. It is the same thing in Digg.

The assumption that Digg is purely egalitarian falls apart just as any assumption that the blogosphere is egalitarian. A-lists are created because we simply don’t have time to negotiate all the content around us – so we link to those we know, and those we know as good content (the A-List). In essence, the A-lists that occur are purely natural, and something we need to find commonality in the network. If the critics of Digg truly wanted to break the A-lists, they would need to convince everyone on the service to screen all of the stories. Since we are time-limited, we can’t do that – so the A-lists will always emerge.

Blogs and sites like Digg create an illusion that networks are flat. In a perfect world, where we all had the time to screen content, the networks would be much more flat. However, since that is not the case, A-lists emerge, and they play a valuable role in the network as a point of betweenness centrality. Sure, A-listers could change this a little by foisting upon themselves a responsibility to link out a little more, but fundamentally, A-listers are just like the rest of us.


18
Aug 06

Metcalfe Responds and Defends His Law

Last month, I wrote a post entitled “The Network Effect Multiplier, or, Metcalfe’s Flaw“. That post cited a preprint of a IEEE Spectrum article by Briscoe, Odlyzoyko and Tilly that pointed out a key problem with Metcalfe’s law – that the value of a network does not grow proportionally, but rather logarithmically. This paper generated substantial buzz, as a lot of the logic of Metcalfe’s law underlies how we value web applications, particularly socially-enabled web applications.

Metcalfe, in a guest post to colleague Mike Hirshland’s blog, responds to the article. Its a very interesting read, and another wonderful example of how blogs enable conversation. Metcalfe first clarifies his purpose in creating his law.

As I wrote a decade ago, Metcalfe’s Law is a vision thing. It is applicable mostly to smaller networks approaching “critical mass.” And it is undone numerically by the difficulty in quantifying concepts like “connected” and “value.”

This is valuable, as it questions the applicability of Metcalfe to large networks. However, as Metcalfe originally used this law to describe the telecommunication network, I’m confused by his definition of smaller. Nevertheless, he’s absolutely correct about the difficulty in quantifying value – I wrestled with this exact concept when I was analyzing the law. Metcalfe goes on to state:

While they’re at it, my law’s critics should look at whether the value of a network actually starts going down after some size. Who hasn’t received way too much email or way too many hits from a Google search? There may be diseconomies of network scale that eventually drive values down with increasing size. So, if V=A*N^2, it could be that A (for “affinity,” value per connection) is also a function of N and heads down after some network size, overwhelming N^2. Somebody should look at that and take another crack at my poor old law.


Affinity, or value per connection is exactly what I was addressing in my analysis. Metcalfe’s original model was built on the assumption that value was binary – people using a telecommunications network, or an ethernet network, can only experience two states of value – full or none. However, in a social network, value is nuanced and conditional. Of course, in A*N^2, the assumption is A is constant through the network, which is not the case. Nevertheless, I’m enlightened to see this, and I feel that it validates my previous work.

Metcalfe goes on to explain how this notion of affinity can be applied to social networks – and the long tail in general.

Social networks form around what might be called affinities. For each affinity, there is a critical mass size given by N=C/A, as above. If the number of people sharing an affinity is above this critical mass, then their social network may form, otherwise not. As Internet access gets cheaper and the tools for exploiting affinities get better, many more social networks will become viable.

Let me leave as an exercise for the reader to develop the formulas for how Amazon’s Long Tail grows to the right as the combination of Moore’s and Metcalfe’s Laws biennially halves the critical-mass size of book audiences. Book buying generally shrinks with time, but I’m guessing that Amazon’s per book critical masses, its N=C/As, have been shrinking faster.

Similar formulas could quantify how Moore’s and Metcalfe’s Laws have also driven down the critical mass sizes (N=C/A) of Internet-enabled social networks and extended their Long Tail to the right. Looking more closely, I see that Metcalfe’s Law recurses. Just being on the Internet has some increasing value that may be described by my law. But then there’s the value of being in a particular social network through the Internet. It’s V~N^2 all over again. Down a level, N is now the number of people in a particular social network, which has its own C, A, V, and critical mass N.

Of course the cost (C*N) of getting connected in a social network has been going down thanks to the proliferation of the Internet and its decreasing price. The value (A*N^2) of particular social networks has been growing with broadband and mobile Internet access. Emerging software tools expedite the viral growth and ease of communication among network members, also boosting the value of underlying connectivity.

This is quite interesting. Down the long tail, we see critical masses of decreasingly small sizes, and these critical masses have been enabled by the simplicity of connecting.

VCMike has this very interesting post, and Om Malik is also following the conversation. Good brain food for a Friday morning.


27
Jul 06

The Scale-Free, Underground Blogosphere

I’ve been tracking the comments on a post Steve Reubel made today entitled “The Underground Blogosphere“. In it, Reubel describes the daily avalanche of email “pitches” he receives from bloggers sending him links. On cue, a number of bloggers complained, assuming Reubel addressed them directly. In fact, Reubel’s post isn’t an attack or sinister in any way – he is simply publicly coping with his status in the blogosphere. I think we may be able to learn a few things from Steve’s post about identity and the nature of the blogosphere.

First, a little background. Reubel’s blog is currently ranked 59 in the Technorati index. To put this in perspective, David Sifry’s last estimate of Technorati’s index size is 37.3 million blogs. Indeed, Reubel and his blog are in very rare air – he literally sits atop Mount Everest in the blogosphere. In achieving this very respectable and noteworthy goal, Reubel has also achieved an interesting place in the network of the blogosphere.

As Barabasi and Watts have shown, large networks, such as the blogosphere, tend to display hub and spoke characteristics. That is, large amounts of traffic tend to flow through central hubs, whereas lesser traffic flows through the spokes in the network. Indeed, this is just like our nation’s air transportation network – places like the Hartsfield, O’Hare, LAX and the NYC airports are the hubs; those hubs begat smaller hubs like Pittsburgh and Dallas, and so on down until we get to the regional airport near your home that doesn’t even have instrument approach. For any number of reasons, networks cluster and distribute traffic unevenly. The patterns that emerge look like a power law, though Barabasi has shown that these networks have scale-free tendencies (see Shirky for a more robust explanation).

The reason I mention these enormously complex, fancy models is to simply prove to you something you already know – that bloggers like Reubel are the “hubs” in the network of the blogosphere. As a result, traffic naturally flows to Reubel – and to all of the other “top” bloggers in the network. Right now, as an example, my referencing of Reubel’s post is reinforcing his position in the network.

So here is my first contention with his claim – that the size of the “underground blogosphere” is very large. As the blogosphere is scale-free, the types of traffic that hubs see doesn’t scale linearly (or log linearly) through the network. If Reubel receives 100 pitches in a day, it is not a safe assumption that the 1000th Technorati blog receives 98 pitches a day, and the 10,000th receives 90 (and so on, reflecting a power law based on 37MM blogs). In fact, due to Reubel’s position in the network, the amount of pitch traffic he sees may be vastly disproportionate to the rest of the blogosphere.

Many of the links in a scale-free network point to the hubs. Indeed, many of the links going from hubs point to other hubs (there are 90 JFK-LAX flights a day, and only 10 JFK-RDU flights a day, as an example). We see this in the blogosphere when A-list bloggers only link to each other, and so on – a rich-get-richer effect. While Reubel’s cohort likely includes bloggers from all parts of the blogosphere, his sample is disproportionately skewed towards A-listers who share his experience. This cohort also sees a large “underground blogosphere.” What’s more, since the traffic in the underground network is largely unidirectional (non-reciprocated and flowing from low to high-ranking blog), this network isn’t reinforced (imagine if all the planes flew from RDU to JFK, and only one returned).

However, even if Reubel’s claims are off, there’s a larger issue here – how bloggers connect. In a blogosphere of 37M blogs, we’ve only got time to evaluate an absolutely miniscule part of the blogosphere. Indeed, the long-tail of bloggers has its audience, but the problem is discovery. The blogosphere dually rewards links brokered through A-list blogs; first, they have passed the editorial screening of the blogger (Reubel in this case), and second, they open up a blog to a new audience who may share common interests. Therefore, it is natural that people would attempt to persuade Reubel of their post’s worth; they aren’t really trying to gain Reubel as a fan as much as they are attempting to get .01% of his fanbase to discover them – a traditional long-tail approach.

If emailing a blogger is ultimately about gatekeeping a small number of fans to your site, what does this tell us about blogging, or peer-production in general? My Facebook research continually makes me think about why we do anything online. Why do we invest the time to create things like blogs, social network profiles, webpages? It is clearly so we can be heard, that we can have the affirmation of audience. Why do you blog? Do you blog because you want to improve your writing, be known to the technoscenti, get a better job, promote a political cause? We all blog for reasons, and those reasons are always personal. However, there’s nothing wrong with that; the folks who tried to explain away why they emailed Reubel (in the comment thread) amused me. We want audience; we want power-brokers to give us approval. And there’s absolutely nothing wrong with that, because that’s how the real world works.

I’ve given Reubel’s post a very even read. While I disagree with his size estimate, I do think he is onto something. A marketer can never stop being a sociologist – as such, they are keen to observing interesting phenomena. 100 emails pitching links a day? That’s an interesting phenomena, but why do we need a conversation about it?

In the thread, a number of commentors reported how they found these “notes from the underground” to be useful. I’m certainty one of these people. I get a few emails a week from people reading my blog; some are marketers pitching products, some are other bloggers going off-the-record, and others are people who just didn’t feel 100% comfortable leaving a comment (or didn’t feel like making the effort to step through all of Blogger’s 9 steps). As I don’t get a lot of these, I’m able to look through these emails and see what is what, and respond accordingly. Through this process, I’ve managed to make some very meaningful contacts. I would hate to think that Reubel would have a muzzling effect.

The fact of the matter is that Reubel sees so much traffic he isn’t able to make the distinctions between what is good and what is chaff. Indeed, that could be personally frustrating, but it comes with the territory. One doesn’t get into the top 100 without a tremendous amount of personal marketing; it only stands that others want his place. Indeed, one day they will have it. Reubel wants a conversation about the underground blogosphere; in a sense, I’m participating right now. I’d hate to see the echo-chamber emerge and start calling for the end of such practice’s (which Reubel clearly hasn’t). The blogosphere is about conversation, whether that be over blog posts, comments or emails.


12
Jul 06

The Network Effect Multiplier, or, Metcalfe’s Flaw

In valuing a social technology, it is impossible to avoid the enhancement in value the network provides. Stated quite simply, social technologies benefit from an economy that awards value to the service as more people join the service. This, of course, is the network effect; a network gains value as more people join the network.

The classic example of network effect is illustrated in Metcalfe’s law. In valuing a telecommunications network, Metcalfe speculated that the value of the network increased proportionally to the number of participants in the network. As a simple illustration, consider a telephone network with three participants. When the fourth participant joins that network, the value of the network increases for all participants in the networks. Odlyzoko and Tilly, in 2005, further refined Metcalfe when they illustrated that the value of a network doesn’t actually increase proportionally, but logarithmically. To illustrate this refinement, each addition to the network adds value, but in reality, more value is added when my relative joins than someone in Dubai. My patterns of connectivity cluster around my relatives, hence Odlyzoko’s and Tilly’s refinement.

Metcalfe’s law provides the groundwork for a substantial amount of applied network effect theory. One of the foremost applications of Metcalfe is to internet technology, particularly social network technologies. One can easily see the lineage that validates the application of Metcalfe – the internet is indeed a telecommunications network, and the value we perceive from additional entrants is visceral.

Getting to the point, Metcalfe’s law holds fundamentally, but the application of Metcalfe to a wide range of social internet technologies proves remarkably flawed. To illustrate: Metcalfe’s law is built on a core assumption that entrants to the network have a limited set of options. In the 1980′s, when someone joined a telephony network, they had but two options – use the phone or not use the phone. This binary calculation assured that whenever the phone was taken advantage of, the user was getting full value. Ethernet is another telling example; when you plug your computer into an ethernet network, the only options you’ve got are to accept packets or not accept packets. Either way, when you use the network, you’re getting full value.

This notion of “full value” makes the mathematics of network value calculation quite appealing. If everyone on the network gets the same value from using the technology (everyone has the same options – i.e. call or not call on the phone), then valuing the network is absolutely possible. When using Metcalfe (or Reed, or Odlyzoko and Tilly’s refinement) to value a network, the core assumption is that the value we derive from the network is binary – this works for things like ethernet and telephony, but the mathematics prove to be overly crude when applied to social network technologies.

I’ll try to illustrate a comparison. Indeed, Myspace’s network provides two options to you – you can either join or not join the network. If we wanted to apply Metcalfe to Myspace, this is where we’d stop. However, the value in Myspace is much more nuanced than simply being on the network; you can take value from the many things you can do on the network. The network offers a myriad of associations, including friending, grouping, messaging, browsing, stalking – actions that create a compound value that is unique for each network entrant. Indeed, each new entrant to Myspace offers others in the network the chance to create these relationships, but these many types of relationship create a value continuum – which is different than a value binary.

Therefore, the fundamental flaw in applying Metcalfe to social technology is its inherent lack of nuance and granularity. When people join the network, they are given more options than simply connecting; the network is worth the sum of associations and actions that are allowed in the network. We must instead think of network value in terms of a network effect multiplier, as the actual value a network adds to an application is under the direct control of the application designers.

Consider flickr. Flickr is a socially-enabled application built around photographs. Stripped of flickr’s social tools, the service would provide a core value to its users – it would be a very high quality image host and archive. This core value is the “real” economic value of the product; this theory is consistent because flickr users have proven willing to pay for their services. Indeed, the core value provided by flickr is important, but the core value alone is not flickr’s total value. Enter the network, and network effects. Flickr is a socially-enabled tool, allowing users to connect around photographs. The social actions that can be taken in flickr are fairly limited; comments, page views, connections, groups and pools – these are fairly “commodity” social tools (in a sense, all of the social actions are native to the users as they have been previously pioneered). This is lightweight social networking, with very low barriers of entry; the network effect is light as well. To understand the final value of flickr, we multiply the core value by the network effect value (the network effect multiplier).

As a contrast, consider Myspace. Myspace’s core value is quite low. When you log onto Myspace you get a profile, a message box that doesn’t interoperate with the rest of the world, some limited image hosting, etc. However, the network effects of Myspace are tremendous. The size of the network and myriad uses of the network create a network effect multiplier that is much greater than flickr’s. However, since the final value of the network is a function of the core value and the network effect, we see a balancing function.

This balancing function is the key to valuing social technologies. The core value is the raw economic value the service provides to the user. The telephone was useless without the network; however, a service like flickr, or even Myspace would provide value stand-alone. With the telephone, you only had two options when using the network – call or not-call. In flickr and Myspace, you literally have millions of ways to use the network, each with a different value outcome. The network adds value to flickr and Myspace, but the value it adds is distinctly more nuanced than what Metcalfe proposed – and the value the network adds is in the hands of the designers.

As social networking becomes commoditized, as more and more sites make social a part of their experience, the value-add of embracing social will need to be quantified. Metcalfe’s theory is absolutely valid in context, but the applications to social technology lack the nuance that will be required to quantify cash outlays. The good news is that quantifying the value of the network isn’t overly complex. We start with the core value of the service (the non-social value proposition), and the network effect multiplier. As the network effect multiplier is contingent on the site’s design, this can certainly be quantified. Network-enabling a product does not produce a binary value-add; some sites will add lightweight social networking to enhance a core value tool (flickr), whereas others will derive almost all of their value from the network (Myspace). The key to understanding this is knowing that the value provided by the network is variable, and the outcome value of the service is contingent on the core value and the network effect multiplier.

In thinking this through, I’ve tried to focus on the value of the network. However, divorcing the actions you can take in the network from the network’s value kept leading me back to my initial train of thought. The value of a social technology’s network simply can’t be divorced from what you do in the network – the actions you can take are deeply nuanced. We’ve matured from the binary assumption of communicate/not-communicate that network effect theory is built upon. Of course, Metcalfe’s core theory still holds for things like telephony and ethernet networks. However, humans are not computers; our actions, and the derived actions of network participation have variable values. The compound effect of our actions is the network effect multiplier. As we develop socially-enabled applications, consideration of this network effect multiplier will prove useful in determining the value of our labor.

I’m going to officially call this an idea in progress. I’m really struck by the potential value of articulating this properly. At this stage, I’m just at the beginning of doing so. Thoughts, comments and feedback are certainly welcome.


5
Apr 06

Its time to design for RSS.

Like a lot of people I know, I’m addicted to my newsreader (RSS reader). RSS integrates so perfectly into my life, in terms of time-savings and information needs, I’ve noticed it’s actually sort of given me a new worldview. In this new worldview, I view everything through a sort of information-time cost benefit analysis. Put pretty simply, there’s a ton of stuff out there that I want to know, but I consistently value that gained knowledge against time spent accruing it.

The perfect example is a sports score, because it’s a relatively meaningless piece of information (I say this as a sports fan). The Yankees play 162 games, most of which are not on TV in my market, so I often forget to check the score of the previous night’s game. To check the score requires three effortless clicks from Yahoo.com’s front page, but I can only remember to do this every few days. I want the information, but the time-cost and work required to get it, on a daily basis, just don’t work out in my case. If I could go in to Yahoo, and subscribe to Yankees scores in RSS, I’d be happy as a clam. Let’s explore this a little.

Things like sports scores (or stock closing prices, as another example) are little bits of information that are temporal, and require significant effort to accrue. Here’s where RSS steps in. RSS eliminates the temporal nature and significant effort; your newsreader picks up that load. And since things like sports score are tiny bits of information, they fit well into the micro-chunk model of RSS. The only problem is I can’t get a sports score delivered to my RSS reader. Sure, I can filter Google news and get all items matching Yankees and score, and that will have the information, but it will also have a lot of information that I don’t want. Filtering is a good first step, but its not the answer we need if we want to microchunk.

As RSS proliferates, we’re going to need to start designing for RSS, rather than leaving in the afterthought role it currently occupies. Right now RSS, for most sites, is a filtered database dump; even in that primitive format, think how powerful it is. Now imagine a site that designs for RSS, letting you do truly custom, fine grain RSS operations – things like getting sports scores sent to your RSS reader. If a site were able to do this in a user-friendly, low barrier-to-entry format, it would stand to become the clearinghouse of information on the net. RSS is a pervasive, disruptive technology, and there’s no question in my mind that adoption will eventually tick up to the point it become a mass-market tool. The site that most understands the information value of RSSing small, valuable bits of data could stand to become a significant news authority in Web 2.0 and beyond. Its time to design for RSS.


21
Mar 06

Identity and the Web: Information Science Must Pay Attention

I’ve come across a number of articles in prominent publications regarding the problems encountered by individuals when potential employers Google them. In a BusinessWeek article entitled You Are What You Post: Bosses are using Google to peer into places job interviews can’t take them, the author highlights a number of horror stories regarding job seekers and their “Google resume.” The New York Daily News ran a similar story, What a tangled Web we weave: Being Google can jeopardize your job search. In both stories, we see the huge information problem being faced by individuals in their relationship with search engines.

A few months ago, Terrell and I sat down at a whiteboard and began diagramming this problem. The result of our work was claimID. An individual has a relationship with the information about them online; when they Google their name, they are the only possible arbiter of truth as to what is really about them. This is further complicated by the fact that the individual often has no control over what is displayed about them, or in what order the information is presented. At the same time, full-text search only returns matches; things that are about us, but don’t directly mention our names, fail to exist for your searchers.

These problems are discrete, but they interrelate variably – with compounding effects. I’ll explore them a little; primarily, they shake out to be problems of disambiguation, aboutness, authorship and presentation.

  1. Disambiguation. We share names. In technology, this is referred to as a namespace problem. When two things share a name, it is impossible for a machine to differentiate the two. The developers of the internet solved the namespace problem by fiat; they simply made a rule that nothing can share a name. As humans, we can’t really retrofit a namespace solution, so this problem is essentially unsolvable. As long as there is free text on the net referencing a name, there will always be a question of who that name references. There are “solutions” to the namespace problem; federation of identity is the logical step. Regardless, adoption and use of these tools will be outweighed by the simple fact that the namespace problem will always persist.
  2. Aboutness. Aboutness is a fancy term that well, just sort of means what a thing is about. When we search for someone, we are essentially asking a search engine to show us things “about” the person; unfortunately these are the types of queries that search engines perform worst. Just like asking a search engine to describe how a vintage wine tastes, the true answer to questions of identity are incredibly complex. The search engine simply relies on brute force, and just returns anything that matches your name. However, think about all the things on the web that are about individuals – they schools they went to, the towns they lived in. This is just a start. Think about less subtle things – articles in a newspaper that refer to projects an individual worked on (but don’t mention the individual’s name), or a flattering blog post by a friend that just uses an individual’s first name. It is very easy to argue that all of these things, all of these information nodes on the web make up a person’s identity. A search engine figuring this out is obviously beyond the scope of any technology we’d ever be comfortable with, so holistic answers to questions of aboutness are very difficult.
  3. Authorship. Another extremely simple concept, authorship gets complicated from two perspectives in the context of identity search. The more traditional understanding applies; who wrote the things about an individual online? If your name shows up on a forum posting attached to a handle, how would an outside observer know who wrote this? The Namespace problem forces “handles”, and while we might try to use the same handle across the net, chances are we’ve got any number of different logins. The simple fact is that names can show up anyplace, attached to any handle – and for the most part, only the person whose name it is can make that authorship disambiguation. Moving on to a more conceptual understanding, our identity is also made up of what we author. Of course, my blog is about me – even when I’m not ostensibly writing about myself. The case of a newspaper reporter is probably the most effective – When that reporter writes about someone else, they are adding more evidence to their identity as a writer. With handles and famously nonexistent bylines, authorship becomes much more complicated than it needs be on the internet.
  4. Presentation. The simplest, and perhaps most frustrating concept. When someone searches for an individual, it is Google that gets to decide how the results are presented. A great article mentioning that person could end up on page 1 or page 10 of the results – it is really a crapshoot. Imagine if we approached our resumes this way? When an employer looks at a job candidate, it seems to be a fair assumption that the employer googles them – and it is scary to think how that is weighted against the actual paper resume. When I was going through my Google interviews, I answered many a question regarding stuff found out via, well, Google.

In designing claimID, we took the stance that only the individual can speak for what is about them online; therefore, the process of identity sharing must be taken on by the individual. In a sense, I really think that claimID is a nice solution; our tool is extremely simple to use, built for the “rest of us” – the average individual who wishes to speak for their identity online. We explicitly didn’t build a tool to wow A-list bloggers; the focus in claimID has always been service, with as little navel-gazing as possible. The audience for a tool like claimID is so huge, it simply must be built to do one thing, and to do it very well.

However, as I read the articles about this “growing problem” of employers Googling potential employees, and the approach to the problem seems to be throwing up one’s hands in the air, it strikes me that we may be facing one of the biggest problems of the net’s future. Working on a college campus, I see tens of thousands of students putting their identity online everyday – be it in Myspace, or forum postings, email listservs or other services. This is our future – a good deal of us will live publicly, and we’re simply going to need tools to cope with what is about us online – especially the things we can’t control. I imagine how stressful it is for a person who has something embarrassing about them online to take on a job search – search engine results could literally affect the course of their lives, their mental and emotional state.

If there’s anything that surprised me about the claimID beta, it is the scope of the international coverage we’re getting in the blogs. We’ve let a fair amount of users in (think very low thousands), and Technorati shows 170 posts about claimID, many in foreign languages. This proves, beyond a doubt, that identity problems are faced by the whole internet. People are searching for a solution, and while claimID goes a long way, this problem is bigger than our solution. It is imperative that information science takes note, and the greater community starts working to assist people. We have the choice of either throwing our hands up, or doing something about it. Indeed, there are problems we may never be able to solve (namespace, etc), but creative thinking and discussion will lead to solutions that can benefit us all.

It feels a little weird to be essentially inviting competition to join our space, but the simple fact is this is a real problem, that only gets more important over time. It is also inspiring because the problem is such an classic information science problem. Those of us who’ve joined IS departments did so because we want to develop solutions for the issues that emerge in the ongoing relationship between humans and computers. That we could join together and start solving this problem is deeply inspiring, but even more so, absolutely essential.