Seeing Like a Search Engine

How Google's Quest for Legibility is Reshaping the Web

James Scott’s “Seeing Like a State” is a fascinating account of how centralization schemes fail. The book’s title follows from a simple observation: centralized administrators need legibility to function. A state can’t collect taxes unless it knows who its citizens are and how much income they’re generating. A forester needs to know what trees are in the forest in order to extract lumber from it. The concept seems trivial but articulating the need for legibility facilitates clear thinking.

Of course, the organic, natural state of things tends not to be legible. It’s hard to know what’s going on in a forest (not to mention a whole country.) In practice, administrators handle limited legibility in three ways:

  1. They superimpose structures that make the world more legible.
  2. They reshape the world to make it more legible and assume that the complexity they’re removing is irrelevant.
  3. When they fail to achieve perfect legibility, they focus on maximizing imperfect metrics while simply ignoring factors that aren’t easily visible.

Of these three approaches, the first tends to be harmless or even benign. Creating an accurate map of a forest has no impact on the forest itself. Standardizing weights and measures also makes society more legible to the government – the better to collect and assess taxes – but is also useful

On the other hand, the second approach can be incredibly destructive. Early scientific foresters liked the idea of planting orderly, monocropped forests that could be easily managed, but the resulting forests were highly susceptible to disease and ecological collapse. Similarly, centrally planned cities (like Brasília) are easy to administer but feel sterile to residents.

The third approach may be the most destructive of all. To again use a forestry example, it’s straightforward to optimize a single legible metric like annual timber production. But doing so can inflict hard-to-measure damage along other dimensions, like the health or aesthetic appeal of the forest.

The governmental analogy is obvious. Soviet production quotas invariably failed to capture important qualities of the products that were being produced. As a result, they were endlessly gamed. Perhaps the best-known story is of a nail factory producing gigantic, useless nails to achieve a production target measured in tons. While that example appears to be apocryphal, others are well-documented, and as a more general rule Soviet products were hardly known for their quality.

The entire book is interesting, and I recommend that you read it (or at least read a more thorough summary of it, like Venkatesh Rao's or Scott Alexander's.) As I discussed the book with the friends who recommended it, my one complaint was that the analysis draws heavily on examples from two domains, forestry and government central planning, and I would have liked to see examples in other contexts. And then it struck me that the most obvious example was right in front of me.

What does this have to do with Google?

Scott published his book in 1998, which is the year that Google was founded. Like a state or a forester, Google requires a high degree of legibility to deliver its tremendously useful service. Fortunately for Google, the internet is inherently more legible than the real world, and so they were able to successfully build the early version of their search engine without any need to reshape the internet into legible form. For the briefest of moments, Google was just another observer with no real power to shape the rest of the internet.

As Google and its ambition grew, though, it started exerting influence over the internet to make it more legible. Much of this was benign. Webmasters adding sitemaps to their sites made the internet more legible to Google (and by extension more useful to the rest of us) without inflicting any real harm.

As the field of search engine optimization (SEO) developed, though, the internet started shaping itself to Google’s needs in more destructive ways. There’s been a proliferation of content that seems to only be intended for reading by Google. A trivial example comes from a recipe for spinach enchiladas that I recently encountered. The page contained a 433-word preamble, 19 of which were “enchilada.” Far from improving my experience with the (quite delicious) recipe, they made the page much less useful.

A far more profound example is Google’s Accelerated Mobile Pages (AMP) project. AMP is a tightly limited set of web components that publishers can use to build pages. The resulting pages are usually (but not always) faster to load which benefits users. Publishers who use AMP benefit as well, as Google gives AMP pages significantly better placement in search results.

The benefit to Google, though, is that AMP pages are highly legible. The reduced component set makes them easy to crawl, and the inclusion of structured data makes it very easy to extract information from the page. Furthermore, Google can cache the pages and serve them from their own servers, thereby learning about how users interact with the content. (This is why you’ll sometimes see in the URL for articles that you access on a mobile device.) The bottom line is that AMP makes the internet more legible to Google.

This has prompted some backlash, with one polemical piece pointing out that “the web is a messy, complicated place” and negatively comparing the organic parts of the internet to “the Google-shaped web.”

The lens of legibility is also useful for understanding where Google Search delivers sub-par performance. Yesterday, I was looking for a simple tool to plot out a running route. The top results were all useless: poor interfaces, a requirement to sign up for the service, etc. I finally found what I was looking for at the bottom of the front page. Of course, those flaws are completely illegible to Google. They can observe symptoms of those flaws – a quick bounce rate when I visit the page, for instance – but they don’t really have a way to understand what makes one run-mapping tool better than another.

These areas of illegibility tend to overlap with commerce. Anything that lives in the real world – whether apartments, tea kettles, clothing – is inherently illegible to web crawlers. Google can employ some basic filtering, like responding to “apartments with Denver” with links to listings that are indeed in Denver, but the actual qualities that make an apartment appealing to the searcher are entirely opaque to Google. The first page of results usually has the best link somewhere, but it’s rarely the top result.

Seeing Like a State emphasizes the tragic consequences of states seeking after and failing to achieve legibility. Scott repeatedly uses the USSR as an example. That made me hesitant to write this piece, because I do not intend to compare Google – my former employer - to the Soviet Union. If nothing else, Google lacks the coercive power of a state, and its web crawlers are limited to a narrow domain.

But within that domain – the Internet – Google is as powerful as any state, and its crucial to understand Google’s need for legibility. When left unchecked, the need for legibility favors the orderly over the organic and the simple over the complex. But organic complexity is precisely what makes the internet so remarkable. We’ve already lost some of the free-form vibrancy of the early internet, and I fear that the quest for legibility will perpetuate that trend.