Update: See Backtweets.com.
My experience watching the percolation of Freedom throughout the web was instructive – a chunk of viral traffic is moving from blogs to Twitter. If you’re not monitoring your blog/company/brand in Twitter, you probably should.
There are two major Twitter search services, Tweetscan and Summize. I’ve adopted Tweetscan – it is blazing fast and seems to have a larger corpus (i.e. more data) than Summize. Both offer RSS, so you can easily set up searches and stick them in your newsreader.
There is a major drawback to these services when it comes to searching for links. As URL shortening is very common in Twitter, and there are hundreds of URL shortening services, it is often impossible to search exhaustively for links to your domain. Unless you search for all shortened versions of your page (i.e. your link shortened by TinyUrl, Snurl, MooUrl, and so on..), you’re not going to find all of the conversation.
This problem is solvable. For a few minutes I though about building a bookmark that would compute shortened URL’s and search all of them in Tweetscan/Summize. However, this approach is horribly inefficient and I didn’t want to submit my el cheapo hosting service to the load if it went viral. Instead, the Twitter search services need to post-process URL’s they find and build an index of the canonical URL’s. This would allow me to search a URL and find all of the URL’s that eventually point to my domain, regardless of the link-shortened context.
The upside of a service building such an index would be I’d be able to find all links into my blog in one search, rather than individually searching each permalink. If Tweetscan has a post-processed index of all links pointing to permalinks inside of Unit Structures, I’d be able to find all of these links by searching on my domain.
In the meantime, has anyone run into viable stopgap solutions for this problem?








I’ve thought about this, too. Someone needs to dive into all those TINYURLs, URLTEAs and such and pull out the list of where they’re linking to.
Seems like it would be easy to do, too… I’d be curious where all the twit links go.
Not sure this matches your needs, but this Yahoo Pipe sounds like it has at least the bricks you need:
“Given a Tiny URL, find the original link URL, and then look up on Google web search and Google blogsearch what sites link to the URL.”
http://pipes.yahoo.com/pipes/pipe.info?_id=caa47cb2b123ca36cb4f21256a4eb033
What I assume you can do, is use a hack of this one, in a embedded loop (Y! demands you have two ‘pipes’ whenever you want to loop anything.)
@bertil -
I think that’s sort of what I was thinking. The problem with it is exhaustiveness. If you want to do a post-hoc search you need to hit 150+ url services and 150+ searches on tweetscan. So much easier for them to build a fulltext index. TinyUrl, which is the most common in the corpus, has an api…it could be batched. The rest problably send over 300′s so you could build the remainder of the corpus with HEAD’s.
Seen TwitLinks (http://www.twitlinks.com/)? Seems like it’s doing something like what you’re talking about…
[...] Something I asked for a long time ago. Don’t know why Twitter search still doesn’t do this, perhaps now they will. Great execution, smart defaults, instantly indispensable for anyone monitoring Twitter. Excellent. [...]