Google won’t comment on a potential major leak of its search algorithm documentation

Google’s search algorithm is perhaps the most consistent system on the Internet, dictating which pages live and die and what content on the web looks like. But exactly how Google ranks websites has long been a mystery, pieced together by journalists, researchers and search engine optimization people.

Now, an explosive leak that reportedly shows thousands of pages of internal documents appears to offer a never-before-seen look under the hood of how Search works — and suggests that Google hasn’t been completely honest about it for years. So far, Google has not responded to multiple requests for comment on the legitimacy of the documents.

Rand Fishkin, who has worked in SEO for more than a decade, says the source shared 2,500 pages of documents with him in hopes of reporting the leak to counter “lies” shared by Google employees about how the search algorithm works. The documents describe Google’s search API and break down what information is available to employees, according to Fishkin.

The details that Fishkin shares are dense and technical, probably more readable by developers and SEO experts than lay people. The leaked content is also not necessarily proof that Google is using the specific data and signals it mentions for search ranking. Instead, the leak outlines what data Google collects from websites, websites and browsers, and offers indirect hints to SEO experts about what Google seems to care about, as SEO expert Mike King wrote in his review of the documents.

The leaked documents touch on topics such as what kind of data Google collects and uses, which websites Google picks up on sensitive topics like elections, how Google handles small websites and more. Some information in the documents appears to contradict public statements made by Google representatives, Fishkin and King said.

“‘Lied’ is harsh, but it’s the only correct word we can use here,” King writes. “While I don’t necessarily fault Google’s publicists for protecting their proprietary information, I do object to their efforts to actively discredit the people in the marketing, technology and journalism worlds who have presented repeatable discoveries.”

Google did not respond The Verge’with requests for comment regarding the documents, including a direct request to challenge their legitimacy. Fishkin said The Verge in an email that the company did not dispute the veracity of the leak, but that an employee had asked him to change some language in the post about how the event was characterized.

Google’s secretive search algorithm has spawned an entire industry of marketers who closely follow Google’s public guidelines and implement them for millions of businesses around the world. Pervasive, often unsavory tactics have led to a common narrative that Google search results are getting worse, cluttered with the garbage that website operators feel obligated to produce in order to keep their pages visible. In response to The VergeIn past reporting on SEO-driven tactics, Googlers often fall back on a familiar defense: That’s not what Google’s guidelines say.

But some details in the leaked documents call into question the accuracy of Google’s public statements about how Search works.

One example cited by Fishkin and King is whether Google Chrome data is used for ranking at all. Google representatives have repeatedly indicated that they do not use Chrome data to rank pages, but Chrome is specifically mentioned in the sections on how websites appear in Search. In the screenshot below, which I took as an example, the links that appear below the main vogue.com URL may have been partially created using Chrome data, according to the docs.

Chrome is mentioned in the section on how additional links are created.
Image: Google

Another question that arises is what role, if any, the EEAT plays in the order. EEAT stands for experience, expertise, authority, and trustworthiness, Google’s metric used to evaluate the quality of results. Google representatives have previously said that EEAT is not a ranking factor. Fishkin notes that he hasn’t found much in the documents that mentions the EEAT by name.

King, however, detailed how Google appears to collect author information from a page and has a field for whether an entity on the page is the author. Part of the documents shared by King states that the field was “mainly developed and tuned for newspaper articles … but has also been populated for other content (eg scientific articles).” While this doesn’t confirm that author lines are an explicit ranking metric, it does show that Google is at least tracking this attribute. Google representatives have previously insisted that author tags are something website owners should do for readers, not Google, because it doesn’t affect rankings.

While the documents aren’t exactly mind-blowing, they provide a deep, unfiltered look at a closely guarded black box system. The US government’s antitrust case against Google – revolving around Search – has also led to internal documents becoming public, offering further insights into the workings of the company’s flagship product.

Google’s general sleight of hand about how Search works has led to websites looking just like SEO marketers trying to outsmart Google based on the advice the company offers. Fishkin also calls out publications that gullibly endorse Google’s public claims as true without much further analysis.

“Historically, some of the search industry’s loudest voices and most prolific publishers have been happy to uncritically repeat Google’s public statements. They write headlines like ‘Google says XYZ is true’, instead of ‘Google claims XYZ; The evidence suggests otherwise,’ writes Fishkin. “Please do better. If these leaks and the DOJ trial can create just one change, I hope this is it.”

Add a Comment

Your email address will not be published. Required fields are marked *