August 2, 2004

All the News that is Fit to Google

It's not clear why Vin Cosbie is surprised about this:

But when I analyzed its choices of news sources, I was surprised by the results. Although Google spiders more than 7,000 news sources, only about a dozen sources account for the vast majority of stories displayed on Google News day to day, and two of those predominant sources are owned and operated by the U.S. and Chinese governments.
A commentor hits main point number 1: real estate. There is only so much space on a web page.

A second reason is that not all 7000 sources are going to be interested in every story and many that might be will not write about it as they have made other choices.

I suspect that these dozen or so sources are, in fact, the sources that rank highest in Google's ranking methodology. This may be a self reinforcing result since most of us probably read and, if applicable, link only the first story or two thus strengthening rank for those sources. On a statistically meaningless note I have many times gone deeper into the source material and quickly tired of the repetitive and derivative articles.

Vin can do this as well. By doing just a bit more work he can click on Google New's always available "and xxx related" link and find the dozens, hundreds or thousands of other sources that he was originally looking for.

Via E-Media Tidbits.

Posted by Steve on August 2, 2004
Comments

We might think that these dozen or so sources are those that rank highest in Google's ranking methodology, which involves counting the other sites that link to those sources. But that doesn't explain why, for examples, Voice of America or Xinhua consistently rank in the top five.

Outside of the Peoples' Republic of China, s nowhere is Xinhua, the news service owned and operated by that government, a popular or heavily linked source of news. Ditto the U.S. government's Voice of America.

Moreover, that Google consistently uses the same sources indicates that the search engine's algorithms aren't geolocating the news. If a wildfire breaks out in Reno, it seems that Google will use the BBC or CNN or NY Times story about the fire, rather than the more detailed, updated, and authoritative story in the Reno Gazette Journal.

The problem isn't limited Web page real estate, but flaws in Google News' algorithms. Sure, I can always click the site's "and xxx related" link and find the dozens, hundreds or thousands of other sources that are probably more authoritative. But the question is why is Google News fronting the less authoritative versions by just a few news sources? Neither Xinha nor VoA have any reporters in Reno.

Posted by Vin Crosbie at August 2, 2004 4:25 PM
follow me on Twitter