Google apparently accidentally posted a big stash of internal technical documents to GitHub, partially detailing how the search engine ranks webpages. For most of us, the question of search rankings is just "are my web results good or bad," but the SEO community is both thrilled to get a peek behind the curtain and up in arms since the docs apparently contradict some of what Google has told them in the past. Most of the commentary on the leak is from SEO experts Rand Fishkin and Mike King.
Google confirmed the authenticity of the documents to The Verge, saying, “We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”
The fun thing about accidentally publishing to the GoogleAPI GitHub is that, while these are sensitive internal documents, Google technically released them under an Apache 2.0 license. That means anyone who stumbled across the documents was granted a "perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license" to them, so these are freely available online now, like here.
The leak contains a ton of API documentation for Google’s “ContentWarehouse,” which sounds a lot like the search index. As you'd expect, even this incomplete look at how Google ranks webpages is impossibly complex. King writes that there are "2,596 modules represented in the API documentation with 14,014 attributes (features)." These are all documents written by programmers for programmers and rely on a lot of background information that you'd probably only know if you worked on the search team. The SEO community is still poring over the documents and using them to build assumptions on how Google Search works.