A search inside Google’s heart
The New York Times does one of those rare pieces on Google’s searching ranking development center — Building 43 — today and as expected a zillion links are pointing at that article right now. Following are a couple of things that stood out for me.
Editorialisation of search results: For all our fear of Google servers slowly morphing into self-sustaining and hyper intelligent systems, the system still requires a fair bit of human intervention. Jason Calacanis and Mahalo just got another slide to add to their Powerpoint pitch.
Any of Google’s 10,000 employees can use its “Buganizer” system to report a search problem, and about 100 times a day they do — listing Mr. Singhal as the person responsible to squash them.
Variance in expected results: Not all searches are made equal, so are the complaints they receive about it. Sometimes you need a larger sample to validate an anomaly, making it an enough of an expected exception (for the user, that is) than an unexpected exception (which would be an errant result) for them to make changes to the core. That’s when you get the Deja Vu moment, “hey, this is nice and relevant, but it never behaves this way normally.”
But Mr. Singhal often doesn’t rush to fix everything he hears about, because each change can affect the rankings of many sites. “You can’t just react on the first complaint,” he says. “You let things simmer.”
How fast should fast be: Not enough, apparently. Users hit up Google faster than they are able to populate the index with fresh content. Two seconds is a nightmare of a metric to live with. Having known news operations pretty well that break news to earn their living, I can say that putting up even a dummy link or a flash on television or the internet takes close to two minutes. The again, if you see a huge query volume coming for a relatively uncharacteristic subject in that short a time, it should be a good indicator that something major is happening out there.
As an example, he points out what happens when cities suffer power failures. “When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds,” he says.
Classify, rank and reclassify: There is no monolithic process that runs the show for Google. The final bland page you get to see is an aggregation of different classifiers, that are probably ranked again, running independently of each other. Think of it as the difference between wine and champagne. You get to drink a super champagne on the result page.
Classifiers can tell, for example, whether someone is searching for a product to buy, or for information about a place, a company or a person. Google recently developed a new classifier to identify names of people who aren’t famous. Another identifies brand names.
Pretty interesting reading to end the Sunday night. I’d ideally have liked to have written more, but I have a morning flight to catch and the road to woo for most of next week. Adios.
Update: Greg elaborates on my first point in a detailed and much better manner.