Why We Don’t Have Good Local Business Content

The following is a guest post from Long Hill Consulting’s Marty Himmelstein:

Local search’s most significant failure is its inability to provide an accurate stratum of content about neighborhood businesses. The necessity for this base layer arises from the defining characteristic of local search, which is that it is model-based. Local search’s first job is to create an accurate depiction of places in the real world. Being found trumps being reviewed. Being found also trumps search engine optimization. When local search is running on all cylinders it will not make qualitative decisions; if there is a shop on Main Street people will find it. There will be no jousting for position, because the demarcation between fact and advertising will be clear.

To address the failure, a number of Internet companies have either been formed or have started initiatives to aggregate content about brick and mortar stores and services, either as their core service or to improve their core service (e.g., user reviews).These initiatives solicit content directly from businesses, and often, following a wiki-type model, from individuals who have no direct relationship with the businesses for which they create content. In the latter case, the contributors might receive a small financial incentive if the information they submit can be verified, usually by being ‘claimed’ by a submitted business, or when a claimed business buys additional services from the aggregator. To create an initial layer of content, most companies purchase some form of Internet Yellow Pages content from one of several compiled-list vendors. The main content sources for these lists still derive from phone directories, which the list vendors improve through varying degrees of quality control and enhancement.

Because these efforts proceed from one or more incorrect assumptions about the nature of local search, it is unlikely they will be successful. Most adhere to an erroneous ‘walled garden’ view that business content gathered on the Internet is a defensible asset. But information flows freely on the Internet, and since these services don’t control the information sources they require to assemble and maintain a data asset, no data they aggregate can be defended. (The information sources are businesses themselves, and ultimately they control their own information.) It will also be hard for any one of these initiatives to gather a critical mass of content. That local content is both valuable and not defensible is an apparent, not a real, contradiction. Local content is hard to create, but once created is a common data resource: it is best to think of the information about a business as nothing more than a structured web page. Another problem is that Google has already created the technical infrastructure to aggregate and distribute structured business content, and other initiatives have nothing to offer that improves on Google’s technology. Lastly, these initiatives assume the Internet can be used to short circuit real-world notions of community, but it can’t. Unmediated user contributed content, so successful for expressing creativity and points of view, is the low-hanging fruit of local search. It is not the organizing principle upon which local search will be built.

From the perspective of a service that requires better business information than that which is available to them, the justification for a walled garden seems simple: “We know people are willing to contribute content. We’ll create tools to make it easy for businesses, or, following a wiki model, anybody, to supply us with business information. These tools have a development cost and it takes effort to solicit and verify content, but having done so, the content we gather will be much better than standard listings data. This content has value, and there is no reason for us to give it away to others. Further, our users are precisely the ones these businesses want to reach. We’ll try a freemium model, and charge businesses for enhanced representation.”

Too many of these services are vying for businesses’ and users’ attention for any of their individual efforts to succeed. Businesses won’t contribute and maintain the same content at multiple services and pay for redundant capabilities at each. Moreover, once a business creates its digital profile, the marginal effort to distribute it to multiple services is (or could be) small. The ‘business content is a defensible asset’ model erroneously conflates business content with the value-added services that rely on that content to be successful. Business content is indeed valuable, at least as much to the services that need it to build compelling sites and capture advertising revenues as to anybody else. The demand for this content will drive the price to the businesses that supply it to zero. It’s not even hard to imagine scenarios in which businesses derive revenue from syndicating their content to downstream services. One way to ensure that Google doesn’t become the sole depository of business content is to give businesses the incentive to distribute theirs widely.

From an Internet ecosystem and data modeling perspective, multiple walled gardens of duplicated and separately maintained business content makes no sense at all. Popular services might get a continual stream of updates, but new or struggling services won’t, making it even harder for them to gain traction. This ‘each to his own’ approach will perpetuate a morass of inconsistent and obsolete content, much as we have now, to the continuing dismay of consumers.

The adherents to the flawed garden analysis are either unfamiliar with a basic data modeling tenet, or think it doesn’t apply on the Internet. Data can be distributed, copied, and duplicated but each occurrence must be traceable to a known provenance that has an unique identity. For the purposes of data modeling, the Internet is nothing more than a very big disk drive. The storage medium has changed, the requirement for sound data engineering has not.

Unique identity is not a new concept on the web. Web pages and blog posts and comments have at least an informal notion of identity, and second generation content syndication formats support stronger notions still. These formats also support structured content, a requirement for business information on the web. Google Base, a notable example, specifies Atom and RSS 2.0 formats to allow data providers to specify and upload structured content to, well, the Google Universe. Google also provides a query language API so developers can retrieve content from the Google database. Google’s walls are permeable: their interests are served by good content, not its ownership.

Unfortunately, the quality of local business content lags well behind the Internet’s technical capabilities to create, aggregate and distribute it. An important reason for this quality deficiency is that we have relied almost exclusively on the technology that enables the next generation of local search, while underestimating the need to create online representations of the real neighborhoods and relationships within which businesses exist. As I noted in a previous post:

The fundamental role of a community in local search is to establish an environment of trust so that users can rely on the information they obtain from the system. Businesses exist in a network of customers, suppliers, municipal agencies, local media, hobbyists, and others with either a professional or avocational interest in establishing the trustworthiness of local information.

Businesses are responsible for their physical storefronts, and, ultimately, their digital storefronts. But businesses don’t exist in a vacuum, either physically or online. They require the services of the community to which they belong – when online, especially in the formative stages of local search. To create accurate digital storefronts, then, we need to enable the participation of the various constituencies that are part of a community. It is within this framework that a reliable stratum of local content will be created and maintained.

Individuals who contribute content because of a small financial incentive, who are most of the time trustworthy and altruistic but will occasionally be neither, and who have no intrinsic connection with the neighborhoods in which the businesses they describe reside, do not constitute a community. It’s not that their contributions aren’t valuable or even necessary, it’s that they are not sufficient for ensuring an accurate depiction of the local environment. Pick your own war story about how local search failed you in a time of need (everybody has one), assume your need was urgent, and then consider the assurances you would require from the system to trust the information you get from it.

The only way local search can meet these assurances is to build them into its basic fabric. The basic fabric of local search is the community, because the community provides the means to establish the network of trust that is essential to local search. The purely user-contributed content model that works so well for YouTube has shortcomings when applied to local search. The preeminent virtue for YouTube is creativity, for local search veracity. YouTube is whimsical, local search mundane. People use YouTube to pass time, local search to save time.

In an immeasurably weightier circumstance, Winston Churchill remarked “You can always count on Americans to do the right thing – after they’ve tried everything else.” And so it is with local search. Its eventual shape, though tortuously arrived at, seems to me easy to discern. Each business will have its own digital identity and a core of factual information, kept in a standardized format, which it or its designees will maintain. These designees will aggregate content at the community level, as defined above. Designees will include entities, some new, but certainly some that already exist, which are trusted by both consumers and merchants. This core content will feed downstream services. To provide subjective or more detailed information, the basic content will be augmented at various points with user contributed and third party sources of information. Revenue models built around helping businesses and their designees create, maintain, verify, augment and distribute their content make sense; those built around cordoning it off do not.


Marty Himmelstein is the principal of Long Hill Consulting, which he founded in 1989. Marty’s interests include databases, Internet search, and web-based information systems.

For the last eleven years, Marty has been active in location-based searching on the web, a field often called Local Search. Marty was an early member of the Vicinity engineering team. Vicinity was a premium provider of Internet Yellow Pages (Vicinity provided Yahoo!s IYP service from 1996-8), business locators, and mapping and geocoding services. [From Why We Don’t Have Good Local Business Content]



  1. Dennis Yu said

    Hi Marty,

    A detailed and erudite post on why we don’t have good local business content. From a layman’s view, I’d say that the reason why is simple. Local businesses don’t have an easy way to get on the web to write content (though it’s quite easy for folks like us) and they’re too busy doing the 12 other things necessary to run their business.

    So who cares that new services like twitter come up or that you can write a blog, subscribe to RSS feeds, create groups on Facebook, join a social network, submit an article for review, write something for yelp, and so forth. That just sounds like more and more gobbledly gook.

    How about some simple solutions for local businesses to talk about themselves? They know their services best and have the economic incentive to do it.

  2. Dennis Yu said

    And getting local business to post content is perhaps the #1 issue that our local advertising company deals with. Would love to connect with you on ways to make that happen more easily!

