A recently granted Google patent provides a peek at how the search engine addresses some of the problems it might have as a business, a site owner, and an information provider.
When we work on a web site, and identify issues on that site to address those which might have a positive impact upon how well that site might rank in search results, how much traffic it might attract, and how much revenue and return on investment those efforts might have, we try to prioritize the efforts we take.
It shouldn’t come as a surprise that Google goes through a similar process.
Google Maps contains an incredible amount of location related data about places across the globe. That information is collected through sources like telephone company information, region-based directories, pages on the Web, and even verification of businesses listed provided by the owners of businesses and other organizations.
With all those different places and different types of data sources, chances are that some of the information is wrong. Where would you even start to try to correct errors and problems?
A patent granted to Google last week describes a system the search engine might use to decide upon which problems and which questionable location information it might send to a human evaluator to check upon. The system uses a cost/benefit analysis to decide which places to manually review first.
For instance, the location of an emergency room probably has much more benefit than the location of a small restaurant, and would be prioritized above correcting potential problems with location information for the restaurant. Likewise, costs are calculated to make decisions as well. It costs less time and effort and money to check on the location of a small restaurant than it does for a small park, where it’s not known which government agency might be responsible for managing the park.
A severity level may also be assigned to a problem, so that correcting the address of a business listing might be considered more severe of a problem than fixing “a rude or obscene product or service review.”
The patent applies to data that might be on the Web in addition to that found in just Google Maps. For instance, we’re told in the patent that checking a video for problems costs more than checking just a picture because it might take someone much more time to check the video than the picture.
Correcting the internet address of a major online retailer could result in a greater gain to an information system operator than correcting the name of a small local park on a map for a couple of reasons:
(i) correcting the problem associated with the major online retailer may result in higher advertising revenue to the information system operator when information system users click on the internet address to visit the online retailer’s site; and
(ii) more information system users are likely to experience the problem associated with a major online retailer than the problem associated with small local park name, and thus, correcting the problem associated with the major online retailer will have a greater impact on goodwill and user loyalty than correcting the problem associated with the park name on the map.
One of the inventors listed on the patent is Ashutosh Kulshreshtha who is listed as the Technical Lead for Localsearch Data Quality at Google. The patent is:
Systems and methods for assignment of human reviewers using probabilistic prioritization
Invented by Gokhan Bakir and Ashutosh Kulshreshtha
Assigned to Google
US Patent 8,214,373
Granted July 3, 2012
Filed: February 18, 2011
The present application discloses systems and methods for using probabilistic prioritization to assign human reviewers to review data stored in or indexed by an information system.
Some embodiments include accessing an index of data items, where individual data items have a corresponding probability f of having a problem, a cost to review the data item, a penalty if a problem associated with the data item is not remedied, and a gain if a problem associated with the data item is remedied; identifying a subset of data items having a corresponding f that is greater than or equal to a decision threshold based on the data item’s corresponding cost, penalty, and gain; and ranking at least a portion of the subset of data items based at least in part on their corresponding cost, f, and gain.
Google Maps also enables people who use the maps to make edits through Google Map Maker, and according to the Map Maker Getting Started Guide, the edits made there are reviewed before they become final. It’s possible that the moderation and review system described in this patent are part of that process.
It can be easy to forget sometimes that Google has business issues and problems, too. As a portal to a great amount of information on the Web, they’ve had to come up with processes that not only help them fix problems like incorrect information on Google Maps, but also weigh and prioritize those problems.
I enjoy seeing patents like this one that give us some hints at how Google may address some of the business processes they have.
Search for [jaguar] in Google and you’ll see some results for the animal, the brand of cars, and even a result for the NFL football team in Jacksonville, Florida. Imagine that if you clicked on the result for the football team that subsequent search results that you might see from Google in the same search session might be influenced by that selection. Those search results might be re-ranked based upon a contextual search model based upon classifications of the web sites you visit.
That’s the focus of a patent granted to Google this week, which describes how such a ranking system might work, and defines a number of related concepts like how a search session might be defined, how web pages and sites might be classified by the search engines based upon search contexts, how the previous click histories of others might help define that model, and how pages might be re-ranked under that contextual search.
In another example, imagine that someone searches for [mobile phone]. The results they see could include pages selling phones, definitions associated with mobile phones, and news articles about specific phones. Those pages cover a number of different aspects or contexts related to mobile phones. If the intent behind your search is to find a shopping page or site that sells phones, and you click upon a site related to that context, the search engine may temporarily modify other search results that you see to focus more upon a shopping context.
Contextual Click Models
When you search at Google, the results you see are usually ranked based upon both relevance and importance signals. The relevance signals are used in an information retrieval score that will rank a page based upon how relevant it might be to the query terms used by a searcher. These can include things like whether or not query terms (or synonyms or very related terms) might be used in the title of a page, in content on the page, and in anchor text pointed to the page from other pages on the same site and external sites. The importance signals may include an algorithm like PageRank that tries to score how important a page is based upon the importance of pages that link to it.
When a set of documents is returned to a searcher, they may include a diverse set of pages that cover different search related contexts. In my [jaguar] search example, the query term has a number of different meanings: a car, a cat, a supercomputer, an NFL football team. There are results that are based upon shopping, upon news, upon reviews, and upon definitions and facts. Those results are likely too broad and diverse for a searcher who is likely only interested in one search context related to the query.
The patent tells us that a click model might be used to help focus future searches during that particular session, and re-rank search results to show us more related to the context that we might have selected based upon our clicks on search results. These click models would be based upon statistics associated with search results, and would be a “relevance” signal. The patent tells us also that “the most popular context is not necessarily the context in which the user is interested.”
For instance, if I search for [jaguar], the most popular search results appear to be related to pages involving the car and the cat, and the football team isn’t very prominent in those results. As a searcher, I could probably modify my search to [jacksonville jaguars] to get a lot more relevant pages returned to me in a search, but people don’t always modify their searches like that. For many searches, especially when searchers don’t know too much about the topic they are searching for, it isn’t always very easy to determine how to modify future queries to better focus upon the context you’re interested in.
Your query and your decision as to what to click upon, as well as your follow up searches may become part of the statistical “contextual click model” developed around future searches by others for the same query. Your results may be modified based upon such a model, but only for a limited query session. You may be searching for the football team today, and decide that you want to learn more about the feline tomorrow, and having the click model influence the results you see tomorrow wouldn’t necessarily be a good idea.
The patent is:
Context sensitive ranking
Invented by Ashutosh Garg and Kedar Dhamdhere
Assigned to Google
US Patent 8,209,331
Granted June 26, 2012
Filed: March 9, 2009
Methods, systems, and apparatus, including computer program products, in which context can be used to rank search results. Context associated with a user session can be identified. A search query received during the user session can be used to identify a contextual click model based upon the context associated with the user session.
A somewhat related approach can be found in the paper Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search (pdf), by Taher H. Haveliwala, who wrote the paper while at Stanford University, and who was a founder of a company called Kaltex, which was acquired by Google in 2003. He worked at Google until 2008, focusing upon ranking web pages with a focus upon personalization.
Classifying Pages Based on Topics
One of the types of pages that are described in the Topic Sensitive PageRank paper and some similar algorithms are known as hub pages. These are pages that tend to focus upon specific topics and include links to many other pages about those topics. Hub pages also tend to receive a high volume of traffic navigating to those pages and/or a lot of pages linking to those pages. Since hub pages tend to be focused upon particular topics, they are easy to classify as being about those topics. A click upon a hub page showing in search results after a query might help indicate the context of a search.
But people click upon results other than just hub pages when searching. If the search engine looks through search logs and click logs for searches, it might identify a number of search sessions where people tended to select pages that might evidence that they were interested in the same context as a result of their search.
Someone searching for [jaguar] who was interested in the cat might select a Wikipedia page, a page from National Geographic, and other pages that are about the cat. They might modify their original query within that search session to very related terms and identify other pages that fall within the same context.
Someone else searching for [jaguar] interested in the car might visit the Jaguar’s homepage, and other pages about the car. Again, they might modify their search query to a very related term and click on some other pages in the same search context.
Some of the pages being selected might be hub pages, like the Wikipedia page for the cat. Some of them might not be hub pages, like a specific dealership page for the car. But the clicks from the two different searchers during their search sessions might be aggregated and clustered together, and classified as being within different contexts.
Some of the pages might not be included within those clusters if previous searches by others don’t meet a certain threshold. For example, people searching for [jaguar] might frequently choose the Jacksonville Jaguars home page or the ESPN page about the team during their search sessions, and someone might have also clicked upon a cooking webpage once during those sessions. The cooking page would be considered an aberration, and wouldn’t be included in the cluster.
The classification of those pages for that search context might be loosely defined by the cluster, or defined by an administrator of the search engine, or keywords might be extracted from those pages to provide a label to classify it with.
Some pages or sites might also be considered to have more than one search context. For example, a book page from Amazon.com might be considered to have a shopping search context, or a review search context. The contextual search model might treat those classifications as separate, and when re-ranking future searches during a search session, focus upon one context or the other. Or the model might merge together the classification so that subsequent search results might boost pages that are shopping related and pages that are review related.
Someone performs a search and selects a shopping hub page from the search results. They perform a related search in the same search session, and Google might determine that the searcher would prefer to see shopping results instead of results related to news or travel or focusing upon education.
Search results that were ranked based upon information retrieval (IR) scores and importance scores would be boosted in they fit into that shopping context. That doesn’t mean that you won’t see news or travel or educational pages within search results, but shopping results might be ranked higher in those search results than they were based upon just the IR and importance scores.
We don’t know if Google has implemented the algorithm described in this patent for context sensitive rankings of pages, or if it’s something that they might use in rankings in the future, or may never use. But it still has some lessons for us.
While you may strive to have your page ranked as highly as possible for specific queries in Google, the rankings of that page may change based upon the context of a searcher’s search. If you have a Jaguar dealership, and a person searching for [jaguar] who decides to search for the cat one afternoon probably isn’t someone whom you want visiting your page that day anyway.
Both Google and Bing have a number of ways they might re-rank search results other than this context sensitive ranking. For example, on a search using a mobile device, it’s not a long stretch to imagine that pages optimized for mobile devices might be boosted in search results.
The page title, URL, and snippet that Google shows for your page should ideally give searchers an indication of the context of your page. If you have a page about the island of java, and the title of your page is “All About Java,” and the meta description for the page is “Everything you ever wanted to know about Java,” you aren’t helping people understand the context of your page. With that title and description, searchers can’t tell if your page is about the Java programming language, the coffee, or the island.
If search clicks may be used to help a search engine understand the context of a search, making it easier for searchers to understand what the context of your page is when it’s displayed in search results probably isn’t a bad idea. Page titles and meta descriptions shouldn’t just be unique and descriptive for the pages they appear upon. They should also be engaging and help persuade searchers to click upon them.
And those clicks may earn you even more clicks.
The first days of summer, and the mercury in my thermometer raced for triple digits yesterday afternoon. This is my first blog post on the Webimax blog, and I’m excited at the chance to share this space with some people who are very enthusiastic and excited about search and SEO, and the growing social and semantic Web.
Almost seven years ago today, I started my blog at SEO by the Sea, and while a lot of things have changed since then, SEO isn’t one of them. It’s still an ever growing, ever evolving discipline aimed at helping site owners and searchers find each other through the medium of the Web. Search engines are starting to look at some new signals to rank and display pages and meet the situational and informational needs of searchers.
Some of those signals involve search engines looking at a more social Web, and how real people, real authors, and real content creators interact with other people on the Web on topics of interest and expertise. Our search results aren’t just being populated with results determined by algorithms, but also by pages written or shared by people we might connect with personally.
Some of those signals involve search engines understanding when specific people and places and things, referred to as entities, appear in search queries. Search engines might use references to those entities in knowledge bases like Wikipedia and Freebase, as well as in search engine query logs to determine what to show searchers on search results pages.
By looking at search query logs and encyclopedia-like resources, search results can be more suited to meet the intents of searchers and anticipate what their nest queries might be.
At SEO by the Sea, I’ve written a lot of posts about search patents and white papers, and I’m going to be doing that here as well. Some of those posts may be in-depth looks at specific patents or papers, while others might be more overviews of some of the new technologyies and approaches that might be hinted at in those sources from the search engines.
In 2005, in the first week of summer, a search for new patent applications that included the phrase “search engine,” returned 26 results. A search for all the pending patents published that week that included the word “google” brought back 6 results. Those numbers were pretty typical and representative of the amount of patents on those topics being published at the time.
Search Engines and Google have gotten a lot hotter since then. This week, a search for new patent applications including the term “search engine,” numbered 132. There were 84 patents published this week using the word “google.”
So what kinds of patents did I see this week that I thought were interesting?
A Facebook patent application titled Comment Plug-In for Third Party System, gives us some hints at how Facebook’s off site commenting system works, and ties into how those comments are also shared on the social network itself.
Microsoft has published a patent application titled Social Marketing Manager, which describes how the search engine might create and monitor social networking campaigns, and “facilitate” social interactions online. If Microsoft pursues the opportunities described in this patent, Will it be a service that competes with social media marketers or collaborates with them? Is there a potential conflict of interest in them helping to promote social media and use it to rank social results at Bing?
Google takes a second bite at an apple with a new pending version of a granted patent on ranking news articles
Like many others, I’ve been wondering if Google will start showing advertisements a Google Plus, and a new patent filing from Google titled Providing Advertisements on a Social Network provides some tantalizing hints at the possibility.
Another Google patent application gives us a look at a possible new approach by the search engine to allow people to search through reviews for a product.
One of the really fun parts of digging through patents like these is the chance to gain a perspective of the Web through the eyes of search engineers, and gain some insight into the assumptions they make about the Web, about searchers, and about search. Sometimes the inventions described in patents actually make it off paper and onto the Web, like the Facebook commenting system above.
Sometimes they don’t, but they tell us about some of the research done at Google or Microsoft, and possibilities that might become something more. Will Microsoft offer social marketing services? Will Google show ads at Google Plus?
I’ll be providing a look at pending and granted patents here, and we’ll be looking at those closely and working on how we can use what we learn to provide the best services to our clients that we can, and to help the industry grow.
Yesterday’s SEO is dead, but tomorrow’s is pretty exciting.