Newslines lets you create cool news timelines about your favorite people and products. Register now!

Google and Wikipedia: Best Friends Forever

4 Mar, 2015

GooglepediaA few days ago, in an article in The New Scientist, Google researchers said that Google will soon rank webpages based on the quality of facts on a page. The idea is that Google will use the billions of facts they have collected from Wikipedia, and other people’s sites without their permission, (facts aren’t copyrightable) and match those facts with the content of a site’s webpages to give a “trust ranking” for every page. This ranking will then influence the page’s position in Google’s search results. As many of the facts come from Wikipedia, this most likely means Google results will be fact checked with Wikipedia facts.

There are many problems with this approach: Wikipedia pages are often unreliable. They are easy to vandalize. The toxic environment means editor numbers are rapidly declining, making many pages unstable and out of date. Pages are also easy to game, both by external forces (spammers) and internal forces (bias seekers). Building a trust network on the top of such mutable data is like building a skyscraper on sand.

If new Wikipedia editors were told “Please contribute to our free encyclopaedia….and we will use your content to enrich Google”, they would think twice about their contributions.

The bigger problem is that this initiative turns Wikipedia’s editors into Google’s unpaid fact checkers. It’s true that people edit Wikipedia for a variety of reasons. There’s some satisfaction to be had in working for months to get a particular nugget of information on a page, but for many contributors the site was supposed to be a free resource: contributed to for free with the promise that the content would be free to the reader and free of advertising.

Instead, information from Wikipedia, in the form of Google’s Knowledge Graph, is being used to build Google’s profit, and not in a small way. Knowledge Graph data — shown in the top right hand corner of search results — is shown on around 20% of the page, and certainly accounts for many internal clicks. Certainly, if new Wikipedia editors were told “Please contribute to our free encyclopaedia. We will use your content to enrich Google”, they would think twice about their contributions.

Co-Dependent Search

The Knowledge Graph is just the most obvious part of the co-dependent relationship between Google and Wikipedia. The relationship most obviously benefits Wikipedia by giving it traffic. Jimmy Wales, Wikipedia’s co-founder, said in 2010 that the site received 60-70% of its traffic from Google. Wikipedia is almost always in Google’s top three results, and more often than not it’s the top result. The top result is clicked 36.4% of the time and one of the top three results is clicked 58.4% of the time. I pointed out in my last article that there is practically no need for the second page of results as 94% of readers click on a link on the first page of results.

It’s fair to say Google is the search engine for Wikipedia, and Wikipedia is Google’s (free) content provider.

This means that Google is giving Wikipedia around one third of its traffic. But how is this good for Google? Surely Google would rather keep people in Google than let them go to Wikipedia? Well firstly, the Knowledge Graph does keep people in Google longer. For example, instead of going to IMDb for movie data, owned by competitor Amazon, the Wikipedia snippet is right there on the page as well as the list of movie roles. The searcher stays in Google’s system.

A more important reason is that the Wikipedia link keeps Google’s competitors off of the top result. For example, the fight between IMDb and Wikipedia for the top spot for movies benefits Google immensely. If Google can shift IMDb from first to second place then IMDb gets 66% less clickthroughs, an enormous number of potential customers lost. Google can then defend itself by saying that Wikipedia has a “better” ranking, but that’s self serving.

But the most important benefit that Google gets is a high-quality link that its users trust at the top of its results. For many users the Google -> Wikipedia chain is their standard path to check the background on a topic. We’ve all heard someone’s name, gone to Google and then clicked the link to Wikipedia, to find out more. This relationship is gold to Google.

If Wikipedia was not in the top of Google’s results, Google’s utility would decline substantially. Given that almost 60% of clickthroughs are from the top three results, and Wikipedia is always one of those results, it’s fair to say that Google is the search engine for Wikipedia, and Wikipedia is Google’s (free) content provider.

Page rank is rank

To make this relationship work, I believe that over the past years, Google has changed its ranking algorithm to make Wikipedia its “gold standard” for ranking. This means:

The Google-Wikipedia nexus gives us inferior search results leading to inferior content. That’s not good for users, and isn’t good for the web.

  1. Sites that are not Wikipedia will be ranked lower, no matter the independent quality of their content. There are a lot of problems with Wikipedia content and format that I outline here, but let’s take as an example the recent 2014 Ebola outbreak. The Journal of the American Osteopathic Association says “Most Wikipedia articles representing the 10 most costly medical conditions in the United States contain many errors when checked against standard peer-reviewed sources. Caution should be used when using Wikipedia to answer questions regarding patient care”. So, when half of the results for the keyword “Ebola” are going through Google -> Wikipedia rather than the Center for Disease Control, who benefits?
  1. Sites must conform to Wikipedia-style content to get high rankings. To compete with Wikipedia, you don’t make a site that’s better than Wikipedia, you have to make a site that Google thinks is better than Wikipedia. But, if the algorithm is tuned to Wikipedia, to compete you must have Wikipedia-like content. Even if you did make a site like Wikipedia it would be just be like Wikipedia, so why would people go to it? Let’s say a new website comes out that uses video instead of text to let people know about topics. It would rank lower than Wikipedia because it doesn’t have the Wikipedia-style format Google likes.

That’s what happened to Jason Calacanis’ site Mahalo. Calacanis says he believes that Google deliberately changed its ranking to destroy his human-generated search engine.

Mahalo was an awesome effort by a killer team. We hit $10m a year in advertising (all networks), 15m uniques and we were in the top 150 sites in the USA. Matt Cutts killing the business really pissed me off as well. He just smiled and told me “you don’t have a penalty” with a shit-eating grin…. they targeted us for destruction and I had to lay off 80 americans working from home full-time.

The Google-Wikipedia nexus gives us inferior search results leading to inferior content. That’s not good for users, and isn’t good for the web.

Who Benefits?

Despite Google using Wikipedia as a buffer against their competitors, using Wikipedia’s data, and using the site to enhance its status, none of Google’s $70 BILLION a year revenues goes back to Wikipedia.

Except that, in 2010, Google donated $2 million to the Wikimedia Foundation and in November, 2011, Sergey Brin (net worth $30.4 billion) donated $500,000 to the Foundation, followed by another (not highly publicized) donation of $1 million in 2013. Brin’s Google co-founder and current CEO, Larry Page, (net worth $30.9 billion) donated nothing. These donations must surely be the deal of the century.

It’s important to note that none of Brin’s donation goes to the people who write and edit Wikipedia, it goes to the Wikimedia Foundation. Even though the Foundation has over $51 million in the bank ($27.9 million in cash and another $23.3 in investments) it collected $50 million in donations last year from people who, misled by alarmist advertising, think the site is in imminent danger of collapse. Instead, the money was spent on first-class travel, highly-paid ineffective programmers, and expensive office furniture. Only 5% of the donations went on hosting. (Stop Giving Wikipedia Money).

Last week Jimmy Wales defended this excessive fundraising as necessary to give the site “reserves”, but that reason was invented after the donations were taken.

None of Google’s or Brin’s billions or the Foundations millions go to the people who actually built Wikipedia.

Wales is now employed by Google to help them in their “right-to-be-forgotten” negotiations. Wales appears to be Google’s spokesperson on the issue and is regularly quoted in the media defending Google. One has to wonder why Wales has taken on that role, when Brin, Page, or any other high-profile Google employee should be defending their own company.

Billions for them, nothing for you

None of Google’s or Brin’s billions or the Foundations millions go to the people who actually built Wikipedia. Writers and editors get nothing, other than an imaginary pat on the back. Ordinary people who gave their time and effort to build up a free resource, and have worked with the best of intentions, now find themselves being used as Google’s unpaid fact checkers, and see their work filling the content of Google’s highly-profitable search results.

So, next time you think of adding or updating something on Wikipedia, think of Jimmy Wales clicking champagne glasses with multi-billionaires Sergey Brin and Larry Page, in one of their many private jets, flying high above you. Google and Wikipedia might be BFFs, but they’re not your friend.

Mark Devlin is the founder and CEO of Newslines, a new crowdsourced news search engine. Find out more about him here, and more about Newslines here. Click here to follow Mark on Twitter.