February, 2008


14
Feb 08

Your house, now searchable

A few days ago, Google expanded its street view program, adding a bunch of new metro areas. I was both pleased and a little freaked out to find both my hometown and current residence included in the maps. This has given me some new perspective, which I’ll share today.

First and foremost, the streetview maps are really interesting. The technology and integration is very cool, and the maps are useful. I’ve used them to pre-navigate around cities, and its fun to get a street level view of cool parts of Manhattan or SF. Please don’t accuse me of not appreciating the maps.

As the mapping program scales out nationwide (as it inevitably will), I wonder how people will negotiate the loss of personal privacy implicit in being streetmapped. Its certainly one thing to have your address online, and its another to have multiple, zoomable views of your house pop up with you Google yourself.

Of course, the streetview data is public. There’s no law preventing anyone from taking a picture from a public street and putting it on a map. But as we’ve seen again and again, privacy is both quantitative and qualitative; Google isn’t breaking any laws by posting this data online, but one can certainly argue they are pushing the boundaries of our senses of privacy.

Employing Altman’s theorization of privacy as sets of boundaries, or danah’s notions of publics, we see there is a privacy negotiation in “living public.” I live on a public street; I expect people to drive down my street and see my house; some of these people will know it is my house, some won’t. This process of disclosure informs my privacy expectations, and if I’m not OK with it I move out to the country and live on a mile-long private road.

Implicit in the disclosure process is also a “finding” process. Up until last week, if you wanted to find someone, you had to locate their address in a white page and then drive down their street. Certainly a high bar for the non-stalker types. Now, the finding process has been shortened by one step: all you need is an address.

This change in the finding process forces us to remap our privacy expectations. One’s domicile is no more than a click away; entire cities at a time are forced to live publicly because of Google’s decision. As this program fans out to lower-density areas, I wonder if there will be any significant pushback.

Having one’s house streetmapped also affects one’s relationship with Google. When you search your name in Google and find less-than-desirable results, its likely you’ve shrugged it off because that is a small tradeoff for the aid Google affords you. Google has significant agency in your online identity, but its not a big deal because most of us don’t care about our online identity all that much yet.

With streetview, Google has gained significant agency in your offline identity. Your house is now searchable by anyone; others may peer into your windows, zoom in and out, and explore your house from multiple perspectives. Is this simply another tradeoff we’ll make so we can gawk at the houses of others? And to put it more bluntly, has Google gone mad with power?

In one fell swoop, Google has taken millions of people and made them searchable. Sure, most people won’t notice, and many don’t have the technical skills to try and fight this invasion of privacy. I wonder how it will affect these people’s perceptions of Google. Is Google still the friendly search engine now that it has your house on file? Does it matter? Google’s in the ad business, not the perception business.

Even Facebook, for all its creepiness, doesn’t encroach on this real-life boundary. This is a new form of disclosure, and I hope if it will start a discussion on how much information about a person a corporation can disclose. There are so many other databases out there Google could buy and make public (credit reports, arrest records, magazine records, etc.), if this deeply visceral disclosure doesn’t give us pause, what will?


8
Feb 08

The subjective computer has found us

For the past few days, I’ve been thinking about the information products and byproducts of social computing. Products may be thought of as things we create with intent; our Facebook profile, our home page. Byproducts, respectively, are the things we create with limited intent; our attention data, the traces we leave in server logs, the software products that appropriate our agency.

From a volume standpoint, the amount of data byproducts we produce significantly outweigh our pure data products. Maybe we’ve got 15 profiles on social networks, but Google’s got gigs of our email, search logs, and click streams. Following Irwin Altman’s notion of privacy as boundaries, its easy to see how we delineate between these two data sets, even though they’re identical at the binary level: one we see, and one we don’t.

At SGFoo, I participated in a number of discussion around data byproducts and the social graph. Leveraging your explicit connections (a data product) and attention or network data (byproducts), service providers could expose all sorts of novel information to you. I tend to agree; the jumble of connections and intentions and algorithms can likely tell me all sorts of new and interesting things.

In a post danah boyd wrote a few days ago, she cautioned against where such objectively computational approaches lead us, that the negative effects of such systems may outweigh the perceived gain. I tend to agree; the leaders of the social computing space possess an alarming antipathy towards privacy, especially when weighed against the benefits of derived, latent knowledge. Of course, this is the ideology of Google or competitors; in the graph, we’re all just documents with linkages, our behaviors subject to Map Reduce. The privacy advocate stands in the way of progress, the natural state of industry.

Drawing back to the initial distinction I posed, the product and byproduct, I wonder if there isn’t a self-regulation implicit in the system. Perhaps norms other cultural processes will make taboo the “reveal” implicit in surfacing computed data byproducts. It’s creepy when a computer tries to figure you out, it’s creepier when a computer tries to figure you and your friends out, and perhaps the creepiness of all of this makes leveraging such knowledge in social processes taboo. We may be able to compute it, but we may not actually want the information because the objective boundary is crossed.

In 1996 Sherry Turkle proposed that we were looking for the subjective computer, one that became a place of identity reflection and expansion. At the time, it was alarming to think of a computer to which we bared our souls. Of course, 1996 was a different time for computers: we weren’t hyperconnected, massive data stores like Google were nascent, the notion of sharing one’s real identity online was anything but pervasive. These conditions established a sense of mastery over what one was sharing; the computer could become your second self because, well, you didn’t have to worry about a creepy Facebook app sharing your deep political opinions with your friends without your knowledge.

Do we still seek the subjective computer? I’d argue that, in 2008, the subjective computer seeks us. Since Turkle wrote Life on the Screen, we’ve placed much emphasis on using objective measures to uncover subjective knowledge. Rather than the computer being the device you pour your heart out to, it has become an intelligent proxy. At the same time, there no longer exists the monolith computer; the computer is simply the networked device, routing you to the best places for disclosure and community.

In 2008, we find ourselves in a unique situation where the things we say, and the things we don’t say become central parts of our computer disclosure. It’s no longer simply about our blog post, it’s about who we’ve looked at or talked to. Our machines have frameworks for computing both the intentful and ephemeral things we disclose, our data products and byproducts.

Where does this leave us? When we reached out to the subjective computer, it was a powerful tool that one could master and appropriate for specific purposes. Social interaction, identity play – these were affordances of the device. Now computers master us, leveraging our data to fit us into modeled interactions, exercising tremendous power through selective disclosure, and offering us freedom through a participation process that is essentially repressive.

As I alluded earlier, it is unlikely that we’ll ever become comfortable with the spaces of complete disclosure. There’s always going to be a difference between our shared and mined data, and there will always be social rules standing in the way of leveraging data a person or system has collected about another. This is not to say that the boundaries won’t be tested, or that they aren’t already stretched to frightening levels. Beacon didn’t work because we were uncomfortable with the removal of boundaries, and I’d argue that we’re going to continue to feel this way in similar situations.

It is now time to push back against the devices and networks that seek to master us. It is time to return to places where we exert control, where our data isn’t an asset, and where our mastery over the device sets us free. Horribly naive? Perhaps, but I also might be right. The arms race of analytics may fail simply because we’re not comfortable with the “reveal”. The true loss here, however, is the sense of freedom we once had when the subjective computer was our agent. As we now live in fear of the computer, we’ve lost the ability to seek freedom in it; I think one day we’ll want that back.


7
Feb 08

Facebook API Data Sharing

Via Slashdot, news that the Facebook Platform is falling under increased scrutiny for questionable privacy practices. The issue at hand is developer access to profile information as shared via the API. I’ll see if I can provide a high-level overview.

When you add a Facebook application, you allow the application developers access to your profile. Your profile information is queryable via the Facebook platform API. This means that the data in your profile is passed to application developers via structured methods. An example of such a method is Users.getinfo. If you’ve added an application, the developer can make a Users.getinfo call with your Facebook ID. In response to that call Facebook sends the developer the information from your profile – your name, networks, favorite books and movies, etc. Other calls such as photos.get and friends.get make your photos or friends lists queryable by application developers.

Just so we’re clear, Facebook sends your information only to third parties that you’ve approved (you read the terms of service, right?). It is as if the third party was able to view and save your profile, photos or friends lists. To prevent problems, Facebook regulates third-party behavior through its developer terms of service. The terms of service states that only certain types of your profile data are storable; if the developer possesses (i.e. downloads) data that is not explicitly storable, they agree to delete this information within 24 hours. That is, the company must, under the terms of service agreement, expunge the data that is not storable within a day of collecting it.

Notably, the storable data is very limited. You may store a user ID, or a photo ID, but you may not store a name, favorite book or picture. The only mechanism that regulates this is the terms of service agreement; if a company decides to store the data longer than 24 hours, there’s no technical or DRM-type mechanisms that will enforce data destruction. The privacy equation relies only on good faith between Facebook and the third party.

Facebook has relied on this storage agreement since the beginning of the API. The reason we’re hearing of it today is due to a recent study that found that Facebook applications don’t need as much information as they’re being given. There are clearly larger questions, especially when one considers the scale of Facebook applications. The largest applications have over 2 million daily users. They almost certainly have install bases in the tens of millions. This means that theoretically, tens of millions of profiles could have been downloaded and stored, in violation of the terms of service.

What are the incentives for storing profile information? As a researcher, I can think of hundreds of reasons. Using a small set of 100,000 profiles from across the US (a small application), one could build a valuable marketing database. Even if personally identifiable data was removed from the set, I’d still be able to get great value from the set using probabilistic techniques.

The reality? Likely, most of the applications you’ve added haven’t stored your profile data in violation of the terms of service. Certainly, an app storing your data couldn’t do anything above-board with it (Facebook would quickly and successfully sue). But in reality? With backup tapes, less-than-ethical application developers, or even those who just fail to read the terms of service – yes, it’s likely that some data is stored somewhere. Just as your profile is probably in a browser cache somewhere, it’s likely an app or two has stored your info. Will it be used against you? Will you become part of a black-market database? Who knows.

Now that people are taking a look at the privacy assumptions of the Facebook platform, perhaps its time to start a dialogue around how to solve the problems of SNS API’s. OAuth is one heckuva step forward. However, with the power application developers exert in the Facebook ecosystem, I won’t hold my breath that the all-you-can-eat data stream is going to be turned off any time soon.


7
Feb 08

Major steps forward for OpenID

There’s big news from the OpenID foundation today: Google, IBM, Microsoft, VeriSign, and Yahoo! have joined the foundation’s board. This is obviously a major step forward for OpenID, but it’s also good for the entire open identity movement; the major players are seeing the value in consumer choice and control. At ClaimID, we’ve been advancing these themes since 2005, so it’s especially rewarding to see this news. From the OpenID foundation announcement:

By bringing on these companies and their resources, the OpenID Foundation will now be able to better serve the needs of the entire OpenID community. In 2008, we can expect to see a larger focus on making OpenID even more accessible to a mainstream audience, the development of a World-wide trademark usage policy (much like the Jabber Foundation and Mozilla have done), and a larger international focus on working with the OpenID communities in Asia and Europe. Awesome!

Congratulations goes out to OpenID foundation chairman Scott Kveton, and all others involved in the foundation who’ve worked on this initiative. Scott’s blogged the coverage of the announcment if you’d like some more insight. Again, congrats to the OpenID foundation for this huge achievement – today is a very big day for OpenID and open identity work.

Cross-posted to the ClaimID blog.


6
Feb 08

The Future of Social Software

Last weekend, I spent a few days at O’Reilly HQ for the Social Graph Foo Camp. This was a very interesting experience; I was challenged as both a researcher and practitioner. What I saw made me very hopeful – people agreeing on methods and protocols, solving real problems. Realistically speaking, a camp like SGFoo (or IIW) pushes this work ahead 6 months in the span of just a few days. It’s hard to understate the power of connections, conversations, late nights and lots of coffee and Red Bull.

As it happens, before I went to SGFoo I’d been reading a bunch of stuff on qualitative research methods. Methods books, cases, studies….my brain was very keyed-in to a type of observation that is almost annoyingly analytical. It was hard to shake this perspective as I participated in discussions this weekend. It’s certainly informed some of the thought I’ll share today.

Watching the discussions last weekend was a little like watching the future of social software unfold in realtime. Granted, market leaders will continue to be the vanguard of the movement, but the pathways and patterns these companies will use were the crux of the discussion at SGF. There were a number of advocates for the human perspective and user studies, but the real emphasis was on fast development, prototyping, and seeing what works in the wild. This particular approach has been the hallmark of Web 2.0 development strategies, and I doubt we’re going back any time soon.

Yesterday, danah boyd wrote an interesting piece entitled “just because we can, doesn’t mean we should.” In it, boyd challenges the assumptions of privacy and audience that go into the design of social software; that the desire live publicly is a notion of privilege, available to a select few. It’s hard to disagree. The ideologies that inform Beacon or the initial News Feed are hardly mass-market, and there are countless other exemplars out there.

As a relative outsider to the Valley scene, I found myself being challenged by the assumptions of these new technologies. For a simple example, consider a portable friends list. The idea of a portable friends list is when you sign on to a new service, you can upload or authorize your friends list, and find all of your friends who use that service. Theoretically, this vast, barren new space becomes a rich, social space with the click of a button.

Stepping back for a second, let’s consider the assumptions of this technology. As we’ve seen with Facebook, our networks grow to be very large, a collection of “friends” of varying tie strengths and varying contexts (work, school, family, etc). Furthermore, the process of joining a new social community is one of boundary negotiation and sense-making. That is, you’ve got to learn to crawl before you walk; norms and acceptable behaviors are negotiated over time. When someone signs on to Twitter, friends everyone, and then dumps all their RSS feeds into Twitter, you cringe. They haven’t figured out the norms. Now imagine that, every time you sign on to a new service, you’re forced to learn the norms in realtime, in front of an audience of hundreds of your friends.

The problem is that these assumptions actually aren’t problems in Silicon Valley. If your day job is to design social software, it’s likely you’ve internalized the rules of community, you’re a native. Even if you didn’t know Twitter, you’d figure that dumping your RSS streams into Twitter would be bad form, unless you saw everyone else doing it. The social software power user can easily move between sites; she is also incentivized to discover and master new communities.

With regards to friend networks in the Valley, there’s incredible density in work-friend networks, and likely even family networks. In the Valley, you want to be friends with coworkers, competitors, famous-types; your network is a proxy of your stature. Finding everyone you know on a site is a means to a primarily economic, secondarily social end. Of course, this is hardly a Valley-only phenomenon, but the difference is these assumptions are being written into software for all of us.

This post shouldn’t be taken as an attack on technology or the work anyone is doing; it is good work and it will go forward. Rather, this post should challenge the implementer to look critically upon the assumptions that go into the technology being implemented. Rather than making your average user add a friend list on day one (to increase your userbase), make the addition of users a game in which the user selects the context appropriate friends and learns the norms of the systems. Think about Facebook before and after they introduced privacy to NewsFeeds; such a simple change in assumptions can vastly affect perceptions and experience.

The work showcased at SGF represents the future of mediated social interaction, even if only in the rules, pragmas and assumptions. One thing is clear: This stuff ain’t going away, and it ain’t just for Valley-types anymore. I would argue that research, testing and social thought complement Web 2.0 development models, and perhaps they offer us a way forward as this stuff goes mainstream. These are exciting times.