Posts Tagged: search


11
Sep 08

Search, Lifestreaming and Outsourcing

Google’s Marisa Mayer has written a post about the future of search.  There are a lot of non-trivial problems in some of the scenarios she describes, but I see others – such as location-assisted search – as very useful next steps.  The real point here is that our metaphors of search will change; right now, we use text to sum our anomalous knowledge state, but in the future our location, or relative position in a social network, or even everyday analytics like the outside temparature may guide and inform our searches.  The real next steps in search are the integration and vectoring of search using such data.  To experience this, do a location-based search in Google maps on the iPhone – this is a very early snippet of the future.

The WaPo writes about Lifestreaming (or more appropriately, Datastreaming).  This article focuses on everyday data collection and the tools we use to collect and share such data.  I see Datastreaming as the vanguard of ubiquitous computing.  That is, ubicomp isn’t Bell’s SenseCam, but rather the collection of streams we choose to share (as well as those recorded about us).  Server logs, surveillance cameras, datastreams, lifestreams – these are the “streams” we should be building ubicomp applications to use and support (rather than the traditional paradigm of us integrating ubicomp into our lives).  Chris Messina, featured in the article, delivers another fantastic blog entry, providing a little more background on the article.

Finally, Andy Baio recounts turning to Mechanical Turk to analyze Girl Talk’s new album.  Turking research is an emergent trend – Brynn Evans recently ran a study, and Ed Chi’s group had a CHI paper on MT methods.  I’m sure there are plenty more examples.


6
Jun 08

Searching Twitter Better

Update: See Backtweets.com.

My experience watching the percolation of Freedom throughout the web was instructive – a chunk of viral traffic is moving from blogs to Twitter. If you’re not monitoring your blog/company/brand in Twitter, you probably should.

There are two major Twitter search services, Tweetscan and Summize. I’ve adopted Tweetscan – it is blazing fast and seems to have a larger corpus (i.e. more data) than Summize. Both offer RSS, so you can easily set up searches and stick them in your newsreader.

There is a major drawback to these services when it comes to searching for links. As URL shortening is very common in Twitter, and there are hundreds of URL shortening services, it is often impossible to search exhaustively for links to your domain. Unless you search for all shortened versions of your page (i.e. your link shortened by TinyUrl, Snurl, MooUrl, and so on..), you’re not going to find all of the conversation.

This problem is solvable. For a few minutes I though about building a bookmark that would compute shortened URL’s and search all of them in Tweetscan/Summize. However, this approach is horribly inefficient and I didn’t want to submit my el cheapo hosting service to the load if it went viral. Instead, the Twitter search services need to post-process URL’s they find and build an index of the canonical URL’s. This would allow me to search a URL and find all of the URL’s that eventually point to my domain, regardless of the link-shortened context.

The upside of a service building such an index would be I’d be able to find all links into my blog in one search, rather than individually searching each permalink. If Tweetscan has a post-processed index of all links pointing to permalinks inside of Unit Structures, I’d be able to find all of these links by searching on my domain.

In the meantime, has anyone run into viable stopgap solutions for this problem?


20
Apr 06

The Coming Academic Search War

In the midst of writing two literature reviews (and procrastinating by blogging), I’ve been putting Windows Live Academic and Google Scholar head-to-head. I’ve tempered my exuberance a little as I seem to demonstrate a clear preference to the speed and simple UI of Google; plus, Jeff’s criticism’s are resounding. However, it is clear that Microsoft has caught Google’s attention, with the Google Scholar team today adding a temporal relevance ranking:

It’s not just a plain sort by date, but rather we try to rank recent papers the way researchers do, by looking at the prominence of the author’s and journal’s previous papers, how many citations it already has, when it was written, and so on.

Sort of nebulous, and I imagine open-access publishers will lose out. I find myself inspired by this growing commercial academic search arms race, so I’m going to pick it up for coverage. I think the differences in the Microsoft and Google products are very telling, for a few reasons.

  1. The core assumptions in the user interface are dramatically different.
  2. The difference in time investment each company has made into their products are noticeable.
  3. Each company seems to have vastly different expectations of the market.

Microsoft clearly acknowledges it is playing catch-up in the search market. In creating an academic search product that has a rich interface, and closely integrating it with its core search offerings, you begin to see Microsoft’s strategy emerging: The young people that comprise the academic search market may be stripped away from Google because the assumptions Google works on may not be universal. This is to say: Microsoft is investing in academic search to undercut Google, and I think it’s a very cunning move.

Its hard to not like the Google experience, regardless of hang-ups about Google’s policies and practices. Their search is fast, clean, and often times spot-on. If it ain’t broke, as the saying goes, don’t fix it. However, the nature of the web is changing, and the web’s younger users represent a new bloc of searchers that may vote on different principles. While you and I praise simplicity and efficiency, the MySpace generation may actually want different things, as hard as that is to believe. Microsoft has seen an opportunity to engage the young people with a different type of academic search, and this may ultimately lead to conversions in search preference.

Of course, I’m just reading tea leaves here. The more I study the online behavior of the 17-22 year old bloc, though, the more I’m inspired to climb to the top of a mountain and shout: “Wake up people, these are our future collaborators and customers, and they’re operating on a whole different set of assumptions then we are.” The born-digital “myth” isn’t a myth, and when I see a company make a strategic move towards the born digital’s in such an important market, its really pretty exciting. We’ll crow about how Live Academic isn’t like Google, but I‘d give Live Academic more credit if they just didn’t listen to us. Listen to the youth, because they’re the future.

Update 1: That would be an incredibly awkward and dorky thing to scream from the top of a mountain. If I were on top of a mountain I would probably say something much more terse, or maybe just sit quietly and eat a peanut butter sandwich.

Update 2: Born digital in this context refers to those who have “lived online” their entire life. I’m studying archiving now so I should really be more cautious with a term like born digital.


12
Apr 06

The Evolution of Academic Search

Microsoft has announced an academic search product, academic.live.com, and I’m impressed. Blending elements of Google Scholar and CiteSeer, the Live Academic search seems to address the things students want, but currently lack in Google Scholar. These include:

  • Downloadable citation (Endnote and BibTex – they get major points for BibTex)
  • Ability to screen the search to only open access journals.
  • Ability to sort results by Relevance, Date, Author, Journal and Conference.
  • Structured abstracts.

I believe a lot of students will find these features very useful. As reported on Techcrunch, Live Academic only supports the sciences, but they plan to add subject areas as development continues.

The simple fact is academic search is being decentralized from the institution. Tools like Google Scholar, CiteSeer, Live Academic and the indispensable CiteULike are giving students new options for approaching academic content. We know the woeful state of publisher-maintained databases, so is it really any surprise that the market is reacting? By creating such a strong offering, Microsoft has realized the market space available in the academic search field. In offering students a full-featured, high quality product, I feel there’s significant audience that can be taken from Google Scholar (which truly has the feel of a 20% project, as opposed to an area in which Google is significantly investing).

When professors and librarians complain that students only use the web for research, they are missing the point. Students want academic content, and a great number of students want the best academic content. But searching across 15 library databases that look and feel like they were designed in 1995 just doesn’t fit the model of search our students are comfortable with. Microsoft has seen an area of opportunity, and is giving tools to an underserved population. Google and the various LIS vendors are now playing catch-up to Microsoft.

Postscript: What decentralized (i.e. not tied to your library) services do you use for academic purposes? I talked about a few I use (Scholar, CiteULike, Citeseer), but I wonder if there are any others out there that I’m missing. I’m not really talking open archives and federated repositories, but things more along the lines of consumer tools that students could really embrace.


1
Apr 06

Faceted Search Interfaces

Facebook recently changed their search function from a faceted interface to more Google-like free text interface. After evaluating it, I have come to the conclusion it may fail to serve the Facebook user base. This has led me to ponder rightness-of-fit in search interfaces, primarily making me think about how important facets are in people-search.

First, what are facets? In the context of the Facebook, facets are the things about us. For example, if you’re single, a graduate student and studying anthropology, those are three facets of your identity. A faceted search interface lets you select from available facets, essentially creating a narrowed result set of the full corpus. The key in faceted search is the searcher has the ability to know all of the facets available for their search. Understanding the corpus empowers the searcher. The searcher has confidence that the result set is exhaustive, which is important in the context of identity search.

How people search for each other is not exhaustively documented. In a master’s thesis entitled A Framework for the Development of a Social Linking Theory, Tom Ciszek explores, but doesn’t address the particulars of identity search. I feel, however, that we can probably match up identity search pretty well to our existing understanding of information retrieval. In identity search, we are either finders or browsers. The finder is looking for an explicit good – a person they’ve met or wish to research. The browser, on the other hand, is looking to explore a subset of all people, whether that be singles looking for other singles, conservatives looking for other conservatives, and so forth.

The thing about facets is that when we search with them, we know we’re getting back all possible matches. In the context of people search, this is very important. In browsing, we’re willing to exhaustively examine a result set, which is traditionally associated with recall; in people search, however, we want precision in our result set so that we’re sure we’re not missing anyone. By taking the native faceted interface and replacing it with a google-like interface, I imagine that many people interested in the browsing side of people search will be frustrated. By empowering the user with simplicity, the elegant power and precision of faceted search are lost.

The problems are twofold. First, the taxonomy that the search keys on is no longer at the hands of the users. For example, searching for senior, a term traditionally identified with fourth-year undergraduates, returns only 149 results in the UNC Facebook. How can this be? Well, as it turns out, the Facebook only knows date information, so you actually need to search the year of graduation to find seniors. Not a big deal, right? Well, take that problem and magnify it over the hundreds of different facets we can have in the Facebook (there are 7 or 8 political affiliations, hundreds of majors and minors, etc), and you realize there’s no way a browser could ever recall the entire taxonomy. In making things simpler, they’ve actually become significantly more difficult. The second problem is that matches are now fuzzy, so even if you master the taxonomy, precision is a thing of the past. Searching for 2009 liberal returns a result set that matches 2009 and liberal anywhere in the profile, including students who are “very liberal” or who “hate liberal people”. The confidence that comes in being able to narrow down a result set by facet are lost.

I admit that how we search for each others is not a known entity. Its been a long while since I’ve logged on to a dating site, but search in those interfaces always key on facets (smoking/non-smoking, religion, education, etc). When we’re trying to find people, we want to be able to join, narrow and explore exhaustively. False positives are frustrating, and the notion that we might miss someone even more so. The Facebook is not ostensibly about dating, but a large part of the behavior of the students is discovery, a process not unlike the precurson to dating. As I’ve said before, as long as the Facebook gives students interesting, satisfying ways to discover each other, they are on the right track. By choosing simplicity over information needs, I can’t help but think this is a step in the wrong direction.