Query ambiguity is a very important and an interesting problem that search engines face. On an average, search engine queries are 2.4 words long, which, in most cases, does not provide enough context to understand the searcher intent. Also, most queries are not even grammatically correct sentences or phrases, which prevents search engines from doing any deep processing, like semantic analysis. The most a search engine might do is shallow semantics like this Yahoo patent does, or do some bag of words word sense disambiguation. But the applicability of these techniques is quite narrow.
As the search engine cannot completely understand the intent, it picks results that contain the query words and are most popular. This results in a bias towards the head intent of the query, and completely ignoring the tail intent. For example, the query “jaguar” has a head intent of the car and a tail intent of the animal. Without any intervention, the search results will be biased towards the car.
Some more examples of ambiguous queries:
- The query “dublin” has a head intent of the city Dublin in Ireland and a tail intent of the less popular city called Dublin in California.
- The query “CSI” has a head intent of the TV show and a tail intent of the college of Southern Idaho.
Sometimes, it is not even possible to identify the head and tail intents of a query, and there could be more than two possible information needs. Bing reports that over half of the queries issued on search engines have an ambigious intent. To understand the extent of the problem, see the several possible meanings of the word “Oregon” on Wikipedia, in addition to the head meaning of the state of Oregon. Deciphering a different meaning of the query, and showing those results to a user is grounds for an immediate search engine switch!
How about showing the user all the possible meanings of the query and asking the user to choose? But this requires effort from the user – an extra click, and is not the preferred way to disambiguate a query.
A recent blog post by Bing shows several ways of solving this problem. Showing results based on the location of the user sort of alleviates the problem, but does not solve it completely. What if a user in Southern Idaho is searching for the TV show CSI and not the college of Southern Idaho? Showing results based on recent websites visited by the user in the past is useful, but still a heuristic, and does not solve the problem completely. What if the user changed his intent?
The blog post reports that Bing is now using something called adaptive search, which is a combination of the two techniques, coupled with recency, and trying to infer a level of confidence in the predicted intent. A confidence level in the prediction shows how sure an algorithm is in the prediction. A high confidence will trigger an immediate inferring of intent, while a low confidence might cause a search engine to interleave search results of several intents into the search results.
A Google patent attacks this problem by introducing the concept of a programmable search engine. A programmable search engine is a search engine that can be passed in several types of parameters, like contexts. For example, a store selling digital cameras might have a Google custom search box on its site. When a user performs a search using this search box, the store can pass in information with the search keywords, like “More Manufacturer Pages” if the search is performed from a page that sells cameras, and “More Reviews” if the search is performed from a reviews page, and so on. This narrows down the intent of a query enough for the search engine to show highly relevant pages for the users information need.
Some good ideas from Bing and Google to identify the correct user intent! They appear to have a strong potential, and are encouraging and provide good reading for the search engine enthusiast!
Art J. AdamsPopular Queries:
- ambiguous queries examples