1 0 Tag Archives: information retrieval
post icon

Overtagging is not a virtue

Recently I used Flickr to search for some beautiful photo’s. I used different tags and indeed I found amazing pictures. What I also notices was that the photographs Flickr returned didn’t always correspond with the tag I used to find them (for instance, use the tag “search”).

This notion got me wondering. If you tag an article, photo, blog, et cetera, there are always tags that are in the bulls-eye and there are tags that are in the outer ring. The underlying principle: the more tags you add, the more likely the chance of finding the tagged item.

This is true, but in information retrieval there’s always a trade off between Precision and Recall. What you want is high on both (get exactly what you want, and a lot of it), but that’s difficult to achieve. As a matter of fact: the more outer ring tags there are, the more noise you get. If every user gives an abundance of tags, the noise gets bigger. Tom Gruber used two pictures in a presentation, that explains this quite beautifully.

“Noisy” Tagging

“Clear” Tagging

Folksonomies thrive on the abundance of tagging, but can there be a thing as “overtagging”? Is there a zero-sum game in tagging that leads to a higher recall, but lower precision?

Conceptual Search engines like Collexis give you the opportunity to score the tag for relevance, thus letting the user sit behind the driver seat for the weighing factors. I’m not familiar with the algorithm used by Flickr, but whether or not it weighs the tags for relevance, I do think that overtagging is not a virtue. If each user tags its items “as spot on as possible”, the total tagosphere would prosper from it.

Does this mean that Flickr should build a Taxonomy of Tags? No, it doesn’t (that’s old paradigm thinking), it’s just that to much of something is never a good thing. What it does mean is that their should be a governance structure to the tagosphere that lets it grow as emergent as possible, but not out of bounds.

Leave a Comment