Search for [jaguar] in Google and you'll see some results for the animal, the brand of cars, and even a result for the NFL football team in Jacksonville, Florida. Imagine that if you clicked on the result for the football team that subsequent search results that you might see from Google in the same search session might be influenced by that selection. Those search results might be re-ranked based upon a contextual search model based upon classifications of the web sites you visit.
That's the focus of a patent granted to Google this week, which describes how such a ranking system might work, and defines a number of related concepts like how a search session might be defined, how web pages and sites might be classified by the search engines based upon search contexts, how the previous click histories of others might help define that model, and how pages might be re-ranked under that contextual search.
In another example, imagine that someone searches for [mobile phone]. The results they see could include pages selling phones, definitions associated with mobile phones, and news articles about specific phones. Those pages cover a number of different aspects or contexts related to mobile phones. If the intent behind your search is to find a shopping page or site that sells phones, and you click upon a site related to that context, the search engine may temporarily modify other search results that you see to focus more upon a shopping context.
Contextual Click Models
When you search at Google, the results you see are usually ranked based upon both relevance and importance signals. The relevance signals are used in an information retrieval score that will rank a page based upon how relevant it might be to the query terms used by a searcher. These can include things like whether or not query terms (or synonyms or very related terms) might be used in the title of a page, in content on the page, and in anchor text pointed to the page from other pages on the same site and external sites. The importance signals may include an algorithm like PageRank that tries to score how important a page is based upon the importance of pages that link to it.
When a set of documents is returned to a searcher, they may include a diverse set of pages that cover different search related contexts. In my [jaguar] search example, the query term has a number of different meanings: a car, a cat, a supercomputer, an NFL football team. There are results that are based upon shopping, upon news, upon reviews, and upon definitions and facts. Those results are likely too broad and diverse for a searcher who is likely only interested in one search context related to the query.
The patent tells us that a click model might be used to help focus future searches during that particular session, and re-rank search results to show us more related to the context that we might have selected based upon our clicks on search results. These click models would be based upon statistics associated with search results, and would be a "relevance" signal. The patent tells us also that "the most popular context is not necessarily the context in which the user is interested."
For instance, if I search for [jaguar], the most popular search results appear to be related to pages involving the car and the cat, and the football team isn't very prominent in those results. As a searcher, I could probably modify my search to [jacksonville jaguars] to get a lot more relevant pages returned to me in a search, but people don't always modify their searches like that. For many searches, especially when searchers don't know too much about the topic they are searching for, it isn't always very easy to determine how to modify future queries to better focus upon the context you're interested in.
Your query and your decision as to what to click upon, as well as your follow up searches may become part of the statistical "contextual click model" developed around future searches by others for the same query. Your results may be modified based upon such a model, but only for a limited query session. You may be searching for the football team today, and decide that you want to learn more about the feline tomorrow, and having the click model influence the results you see tomorrow wouldn't necessarily be a good idea.
The patent is:
Context sensitive ranking
Invented by Ashutosh Garg and Kedar Dhamdhere
Assigned to Google
US Patent 8,209,331
Granted June 26, 2012
Filed: March 9, 2009
Methods, systems, and apparatus, including computer program products, in which context can be used to rank search results. Context associated with a user session can be identified. A search query received during the user session can be used to identify a contextual click model based upon the context associated with the user session.
A somewhat related approach can be found in the paper Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search (pdf), by Taher H. Haveliwala, who wrote the paper while at Stanford University, and who was a founder of a company called Kaltex, which was acquired by Google in 2003. He worked at Google until 2008, focusing upon ranking web pages with a focus upon personalization.
Classifying Pages Based on Topics
One of the types of pages that are described in the Topic Sensitive PageRank paper and some similar algorithms are known as hub pages. These are pages that tend to focus upon specific topics and include links to many other pages about those topics. Hub pages also tend to receive a high volume of traffic navigating to those pages and/or a lot of pages linking to those pages. Since hub pages tend to be focused upon particular topics, they are easy to classify as being about those topics. A click upon a hub page showing in search results after a query might help indicate the context of a search.
But people click upon results other than just hub pages when searching. If the search engine looks through search logs and click logs for searches, it might identify a number of search sessions where people tended to select pages that might evidence that they were interested in the same context as a result of their search.
Someone searching for [jaguar] who was interested in the cat might select a Wikipedia page, a page from National Geographic, and other pages that are about the cat. They might modify their original query within that search session to very related terms and identify other pages that fall within the same context.
Someone else searching for [jaguar] interested in the car might visit the Jaguar's homepage, and other pages about the car. Again, they might modify their search query to a very related term and click on some other pages in the same search context.
Some of the pages being selected might be hub pages, like the Wikipedia page for the cat. Some of them might not be hub pages, like a specific dealership page for the car. But the clicks from the two different searchers during their search sessions might be aggregated and clustered together, and classified as being within different contexts.
Some of the pages might not be included within those clusters if previous searches by others don't meet a certain threshold. For example, people searching for [jaguar] might frequently choose the Jacksonville Jaguars home page or the ESPN page about the team during their search sessions, and someone might have also clicked upon a cooking webpage once during those sessions. The cooking page would be considered an aberration, and wouldn't be included in the cluster.
The classification of those pages for that search context might be loosely defined by the cluster, or defined by an administrator of the search engine, or keywords might be extracted from those pages to provide a label to classify it with.
Some pages or sites might also be considered to have more than one search context. For example, a book page from Amazon.com might be considered to have a shopping search context, or a review search context. The contextual search model might treat those classifications as separate, and when re-ranking future searches during a search session, focus upon one context or the other. Or the model might merge together the classification so that subsequent search results might boost pages that are shopping related and pages that are review related.
Someone performs a search and selects a shopping hub page from the search results. They perform a related search in the same search session, and Google might determine that the searcher would prefer to see shopping results instead of results related to news or travel or focusing upon education.
Search results that were ranked based upon information retrieval (IR) scores and importance scores would be boosted in they fit into that shopping context. That doesn't mean that you won't see news or travel or educational pages within search results, but shopping results might be ranked higher in those search results than they were based upon just the IR and importance scores.
We don't know if Google has implemented the algorithm described in this patent for context sensitive rankings of pages, or if it's something that they might use in rankings in the future, or may never use. But it still has some lessons for us.
While you may strive to have your page ranked as highly as possible for specific queries in Google, the rankings of that page may change based upon the context of a searcher's search. If you have a Jaguar dealership, and a person searching for [jaguar] who decides to search for the cat one afternoon probably isn't someone whom you want visiting your page that day anyway.
Both Google and Bing have a number of ways they might re-rank search results other than this context sensitive ranking. For example, on a search using a mobile device, it's not a long stretch to imagine that pages optimized for mobile devices might be boosted in search results.
The page title, URL, and snippet that Google shows for your page should ideally give searchers an indication of the context of your page. If you have a page about the island of java, and the title of your page is "All About Java," and the meta description for the page is "Everything you ever wanted to know about Java," you aren't helping people understand the context of your page. With that title and description, searchers can't tell if your page is about the Java programming language, the coffee, or the island.
If search clicks may be used to help a search engine understand the context of a search, making it easier for searchers to understand what the context of your page is when it's displayed in search results probably isn't a bad idea. Page titles and meta descriptions shouldn't just be unique and descriptive for the pages they appear upon. They should also be engaging and help persuade searchers to click upon them.
And those clicks may earn you even more clicks.