Google’s Black Hole

In December 2012 Google added extra information in a sidebar on its main search results pages. This information, known as “Knowledge Graph” is compiled from different web sources such as Wikipedia, CIA World Factbook and other databases. In August 2014 Google said it had expanded the concept, creating a database of 1.6 billion facts into the Knowledge Vault, which were compiled from the pages stored on its system. For example, when users type in “Director of Citizen Kane” the system will come up with “Orson Wells” above the standard search results. There should probably be some discussion about where Goole is getting these search results, particularly if it is scraping copyrighted sites.

However, rather than improving users’ search experience the Knowledge Graph actually exposes a black hole in how Google, and other algorithmic search engine, deals with searches. This is not an Interstellar-style friendly black hole, but one that undermines the entire basis for search, with the potential to suck out the most profitable part of Google’s business.

Faith-based search

Google’s search results for Ariana Grande are a good example. Google tells us that there are 144,000,000 million results for the Problem singer. But how do we know these results exist? Has anyone ever looked through the millions of results for each topic? More importantly, does anyone care?

Research shows 36.4% of search queries result in a click to the top result, with a further 12.5% and 9.5% going to the second and third, respectively. The top three results are clicked 58.5% of the time. A stunning 94% of queries stay on the front page. So why then are millions of pages necessary? For the huge majority of searches only the top search results are are ever used. The only time most people ever look through multiple pages of search results is probably to find mentions of their own name. Combined with the fact that most searches on Google are for popular culture topics it’s easy to see that most search engine users are searching for a few topics and clicking on just a few results.

Who is in control? Not you.

No-one knows how Google orders its results. The Company says secrecy is required to stop spammers gaming the system, but its secrecy also stops legitimate companies from proper representation. A whole industry of dubious “Search Engine Optimization” practices has been built to game Google’s results.

Recently the issue of user control over the display of data has heated up. Facebook changed its newsfeed to favor advertisers, and Twitter is doing the same. Every reader implicitly knows that these changes are motivated by greed. But no-one seems to care that Google does exactly the same thing. Google tells you want they think is relevant, and that’s it. You have no choice. They can, and do, change the algorithm at any time, and as a reader you are expected to believe that they are doing it for your own good.

Examining Search Results

Ariana checks her Google results

Let’s look more closely at Ariana’s search results. Click here and check the left-hand side of her results. Google’s algorithm brings back a list of sites from the 144 million, in 0.38 seconds. Readers are told by Google that these are the most relevant to their search term. For the vast majority of topics, especially those on popular culture, the top search results results fall into the following groups: The home site of the topic; the Wikipedia entry; News about the topic; Twitter and social media accounts; Fans sites or blogs related to the topic; and YouTube videos. Apart from the top three the rest appear to be in random order.

Home page – Ariana’s home page. Made by her PR team and any negative information will be censored.
Wikipedia – An extremely detailed 3500 word essay about the singer in back-to-the 90s textbook design with has no videos or audio and a references section that is bigger than the article. A separate discography racks up thousands more words. It’s too much information.
News – Unsorted, random news that doesn’t tell the reader much.
Twitter and social media accounts – unfiltered streams of consciousness.
YouTube links. Random YouTube links. Readers have no idea why those particular links are chosen.
Long-form articles – If readers wanted to read long articles they would have searched for “Ariana Grande articles.”

In the end, readers end up trying to assemble pieces of information from the various sites in the list, which is a lot of work. The search result may have come back in 0.38 seconds, but it’s going to take them much longer to sort and filter that data. Even Google knows that the search engine results are a problem, that’s why they introduced the Graph. But does the Graph give readers enough?

Examining Graph Results

Now check the right-hand side of Ariana’s page, which holds the “graph” data. You’ll see:

Google Images – Images that have all been scanned from other Company’s websites.
Ads for music services — But readers already know that Ariana’s songs are available on YouTube (a Google company) for free. This is very close to bait and switch.
Knowledge Graph data — In this case readers see Grande’s height and sibling information. Not very relevant.
Wikipedia snippet — Not only is this a duplication of the Wikipedia link in the search listing, the snippet contains Wikipedia’s unreliable data. There are numerous cases where vandalized Wikipedia information has shown up on the Knowledge Graph. For example, Margaret Thatcher’s Knowledge Graph said she has received an award for “Bastard of the Year“.
Song list — Links lead to another Google page with the YouTube video on the top. To view all of her songs readers must click on every link individually, which may be good for Google’s ad revenue, but it’s not good for the reader.

Millions of semi-random sites on one side, a dab of dubious data on the other. There’s nothing in between. This missing information is the black hole in the middle of every Google search page.

The most important question

The first questions any media should ask itself itself are: who are the readers? and what question they need answered? Readers don’t want a list of semi-random results that they have to wade through to find information, and they don’t want a useless snippet that doesn’t tell them anything. In fact, readers want to know the answer to a simple question: Who is Ariana Grande?

They want to see a quick history, some of her songs and performances, and find out if she’s been involved in a scandal or some newsworthy event. They want her story to be presented in a way that allows them to control the results, not based on popularity of external sites, but based on what she actually has done that is interesting to readers. They want to be able to filter the results so they can see all her work, her life events, her interviews and awards all from one place.

Being presented with a list of websites about her is way, way down the list, if it is of any interest at all. In the old days of paper libraries directories and encyclopedias were kept at the back in a separate reference section, they weren’t the first thing you’d reach for. Google (and Wikipedia) have managed to convince us that the web is about references, when it’s really about stories.

Google fails in its answer because it doesn’t, and cannot, understand the question.To paraphrase Oscar Wilde, Google is a system that knows the meaning of everything and the value of nothing.

Do readers need a hundred million search results: No
Do readers use a hundred million search results: No
Do readers want a hidden search algorithm? No
Do readers want a list of web pages? Not really
Are readers in control of the results? No
Are readers getting the information they need? No

All hail Gwookipedia!

Makes more sense than Gwookiepedia

Because Google can’t answer the question it puts Wikipedia at the top. Google thinks Wikipedia is the answer to the question, and Google’s ranking system is built to promote high “authority” sites like Wikipedia. But, there are many problems with Wikipedia. It is not the best fit for many questions, and its size distorts the playing field. One example we find is the extension of Wikipedia into news — as soon as a Wikipeida page is made about some minor person that page instantly goes to the top of the search rankings whether or not the page is any good. So because Google can’t answer the question it promotes Wikipedia. Google doesn’t answer the question and gives up as its main result a site that also doesn’t answer the question. This monster is called Gwookipedia (No relation to Han Solo’s pal).

I will discuss Sergey Brin’s $500,000 bribe donation to Wikipedia in my next post.

The search engine is dead. Long live Story Search

The current dysfunctional search engine model gives us answers to questions we are not asking, links us to sites that are not useful, obscures reasoning with secret processes, and is paid for by ads that are not relevant, and that expose us to intrusive data mining.

For the vast majority of topics — the vast majority of profitable topics — a single-page design that tells the story about a topic in the most effective and efficient way possible, using all elements of the modern web, would give much more reader satisfaction than current offerings. The knowledge graph doesn’t enhance experience, but instead exposes that the Emperor isn’t wearing any clothes.

Can Google improve the graph? Maybe, but it will still not be what people want. Google, like Wikipedia, is stuck in a 10-year-old model of how the web works. There may be a time when computer algorithms can create stories but in the meantime we are working to create human-powered pages that give humans control over the results.

—

Mark Devlin is the founder and CEO of Newslines. Find out more about him here, and more about Newslines here. Click here to follow Mark on Twitter.