The Problem: Too Many Results

July 29th, 2008

Cuil just launched their search engine which boasts the largest index on the internet with 120 billion pages. (121,617,892,992 to be precise, as posted on their home page.) While the exact numbers are not always made available, Google, Yahoo! and Microsoft also all have 10s of billions of pages in their indexes. Having as comprehensive an index as possible is a fabulous thing, and a very important prerequisite for search since you can’t find anything if it’s not in there, but it does not solve the problem of putting 10 relevant results on page one.

In their paper entitled “Beyond the Commons: Investigating the Value of Personalizing Web Search,” Teevan et al. make the observation that:

“Web queries are very short, and it is unlikely that a two- or three-word query can unambiguously describe a user’s informational goal.”

Ambiguous intent combined with an exploding quantity of content on the internet makes it increasingly difficult to put all of the relevant results on page one while simultaneously eliminating those that are not pertinent.

Very few people venture past the first page of search results to find what they want, so returning hundreds of thousands or even millions of results is of little value to the user. (You cannot look past the first 1000 even if you wanted to.) Even if the user is particularly motivated, the process of digging through page after page of results is nothing short of tedious, which is the reason users will either quickly turn to reformulating their query or abandoning the search.

The problem is too many results!

The solution to the conundrum is to have a greater understanding of the user’s intent in order to more precisely focus the results. One way to achieve this is to get the user to explicitly specify intent by entering more keywords, although getting people to change behavior is not easy. Another way to achieve this is to implicitly infer intent through the type of long-term personalization offered by Google, although this too has a number shortcomings.

The most effective way to resolve this issue is to implicitly infer intent from real-time behavior signals and then immediately re-rank the results, through the use of instantaneous relevancy calculations, so that the most pertinent results are moved to the top while the less relevant are suppressed. Surf Canyon‘s Discovery for Search is such a solution. Disambiguating intent “on the fly” not only enables users to continue searching with their current behavior, but no search histories or profiles are required. Furthermore, the signals are strong so that the results can be reordered dramatically and the user can actually “see” the process working, creating a more encouraging and perhaps entertaining search experience.

Tags: Discovery Personalization Reformulation Research