How Google May Decide Which Data Problems to Fix First
Bill Slawski, July 13, 2012
A recently granted Google patent provides a peek at how the search engine addresses some of the problems it might have as a business, a site owner, and an information provider.
When we work on a web site, and identify issues on that site to address those which might have a positive impact upon how well that site might rank in search results, how much traffic it might attract, and how much revenue and return on investment those efforts might have, we try to prioritize the efforts we take.
It shouldn't come as a surprise that Google goes through a similar process.
Google Maps contains an incredible amount of location related data about places across the globe. That information is collected through sources like telephone company information, region-based directories, pages on the Web, and even verification of businesses listed provided by the owners of businesses and other organizations.
With all those different places and different types of data sources, chances are that some of the information is wrong. Where would you even start to try to correct errors and problems?
A patent granted to Google last week describes a system the search engine might use to decide upon which problems and which questionable location information it might send to a human evaluator to check upon. The system uses a cost/benefit analysis to decide which places to manually review first.
For instance, the location of an emergency room probably has much more benefit than the location of a small restaurant, and would be prioritized above correcting potential problems with location information for the restaurant. Likewise, costs are calculated to make decisions as well. It costs less time and effort and money to check on the location of a small restaurant than it does for a small park, where it's not known which government agency might be responsible for managing the park.
A severity level may also be assigned to a problem, so that correcting the address of a business listing might be considered more severe of a problem than fixing "a rude or obscene product or service review."
The patent applies to data that might be on the Web in addition to that found in just Google Maps. For instance, we're told in the patent that checking a video for problems costs more than checking just a picture because it might take someone much more time to check the video than the picture.
Correcting the internet address of a major online retailer could result in a greater gain to an information system operator than correcting the name of a small local park on a map for a couple of reasons:
(i) correcting the problem associated with the major online retailer may result in higher advertising revenue to the information system operator when information system users click on the internet address to visit the online retailer's site; and
(ii) more information system users are likely to experience the problem associated with a major online retailer than the problem associated with small local park name, and thus, correcting the problem associated with the major online retailer will have a greater impact on goodwill and user loyalty than correcting the problem associated with the park name on the map.
One of the inventors listed on the patent is Ashutosh Kulshreshtha who is listed as the Technical Lead for Localsearch Data Quality at Google. The patent is:
Systems and methods for assignment of human reviewers using probabilistic prioritization
Invented by Gokhan Bakir and Ashutosh Kulshreshtha
Assigned to Google
US Patent 8,214,373
Granted July 3, 2012
Filed: February 18, 2011
The present application discloses systems and methods for using probabilistic prioritization to assign human reviewers to review data stored in or indexed by an information system.
Some embodiments include accessing an index of data items, where individual data items have a corresponding probability f of having a problem, a cost to review the data item, a penalty if a problem associated with the data item is not remedied, and a gain if a problem associated with the data item is remedied; identifying a subset of data items having a corresponding f that is greater than or equal to a decision threshold based on the data item's corresponding cost, penalty, and gain; and ranking at least a portion of the subset of data items based at least in part on their corresponding cost, f, and gain.
Google Maps also enables people who use the maps to make edits through Google Map Maker, and according to the Map Maker Getting Started Guide, the edits made there are reviewed before they become final. It's possible that the moderation and review system described in this patent are part of that process.
It can be easy to forget sometimes that Google has business issues and problems, too. As a portal to a great amount of information on the Web, they've had to come up with processes that not only help them fix problems like incorrect information on Google Maps, but also weigh and prioritize those problems.
I enjoy seeing patents like this one that give us some hints at how Google may address some of the business processes they have.