Why a Public Data Infrastructure Should Be Developed by Not-for-Profits

We’re at a transition moment in history. Many core human activities are changing profoundly: the way we seek information; the way we connect to people; the way we decide where we want to go, and who we want to be with. The way we make such choices is becoming more and more dominated by a few technology companies with powerful data infrastructure. It’s fantastic that technology can improve our lives. But I believe that we’d be better off if more people could influence these core decisions about how we live.

Excerpted from Michael Nielsen:

“In this essay, I’ve described two possible futures for Big Data. In one future, today’s trends continue. The best data infrastructure will be privately owned by a few large companies who see it as a competitive advantage to map out human knowledge. In the other future, the future I hope we will create, the best data infrastructure will be available for use by anyone in the world, a powerful platform for experimentation, discovery, and the creation of new and better ways of living.

Is it better for public data infrastructure to be built by for-profit companies, or by not-for-profits? Or is some other option even better—say, governments creating it, or perhaps loosely organized networks of contributors, without a traditional institutional structure? In this section I argue that the best option is not-for-profits.

Let’s focus first on the case of for-profits versus not-for-profits. In general, I am all for for-profit companies bringing technologies to market. However, in the case of a public data infrastructure, there are special circumstances which make not-for-profits preferable.

To understand those special circumstances, think back to the late 1980s and early 1990s. That was a time of stagnation in computer software, a time of incremental progress, but few major leaps. The reason was Microsoft’s stranglehold over computer operating systems. Whenever a company discovered a new market for software, Microsoft would replicate the product and then use their control of the operating system to crush the original innovator. This happened to the spreadsheet Lotus 1-2-3 (crushed by Excel), the word processor Word Perfect (crushed by Word), and many other lesser-known programs. In effect, those other companies were acting as the research and development arms of Microsoft. As this pattern gradually became clear, the result was a reduced incentive to invest in new ideas for software, and a decade or so of stagnation.

That all changed when a new platform for computing emerged—the web browser. Microsoft couldn’t use their operating system dominance to destroy companies such as Google, Facebook, and Amazon. The reason is that those companies’ products didn’t run (directly) on Microsoft’s operating system, they ran over the web. Microsoft initially largely ignored the web, a situation that only changed in May 1995, when Bill Gates sent out a company-wide memo entitled “The Internet Tidal Wave” (Letters of Note 2011). But by the time Gates realized the importance of the web, it was too late to stop the tidal wave. Microsoft made many subsequent attempts to get control of web standards, but those efforts were defeated by organizations such as the World Wide Web Consortium, Netscape, Mozilla, and Google. Effectively, the computer industry moved from a proprietary platform (Windows) to an open platform (the web) not owned by anyone in particular. The result was a resurgence of software innovation.

The lesson is that when dominant technology platforms are privately owned, the platform owner can co-opt markets discovered by companies using the platform. I gave the example of Microsoft, but there are many other examples—companies such as Apple, Facebook, and Twitter have all used their ownership of important technology platforms to co-opt new markets in this way. We’d all be better off if dominant technology platforms were operated in the public interest, not as a way of co-opting innovation. Fortunately, that is what’s happened with both the Internet and the web, and that’s why those platforms have been such a powerful spur to innovation.

Platforms such as the web and the Internet are a little bit special in that they’re primarily standards. That is, they’re broadly shared agreements on how technologies should operate. Those standards are often stewarded by not-for-profit organizations such as the World Wide Web Consortium and the Internet Engineering Task Force. But it doesn’t really make sense to say the standards are owned by those not-for-profits, since what matters is really the broad community commitment to the standards. Standards are about owning hearts and minds, not atoms.

By contrast, a public data infrastructure would be a different kind of technology platform. Any piece of such an infrastructure would involve considerable capital costs, associated with owning (or leasing) and operating a large cluster of computers. And because of this capital investment there really is a necessity for an owner. We’ve already seen that if a public data infrastructure were owned by for-profit companies, those companies would always be tempted to use their ownership to co-opt innovation. The natural alternative solution is for a public data infrastructure to be owned and operated by not-for-profits that are committed to not co-opting innovation, but rather to encouraging it and helping it to flourish.

What about government providing public data infrastructure? In fact, for data related directly to government this is beginning to happen, through initiatives such as data.gov, the U.S. Government’s portal for government data in the U.S. But it’s difficult to believe that having the government provide a public data infrastructure more broadly would be a good idea. Technological innovation requires many groups of people to try our many different ideas, with most failing, and with the best ideas winning. This isn’t a model for development that governments have a long history of using effectively. With that said, initiatives such as data.gov will make a very important contribution to a public data infrastructure. But they will not be the core of a powerful, broad-ranging public data infrastructure.

The final possibility is that a public data infrastructure not be developed by an organization at all, but rather by a loosely organized network of contributors, without a traditional institutional structure. Examples such as OpenStreetMap are in this vein. OpenStreetMap does have a traditional not-for-profit at its core, but it’s tiny, with a 2012 budget of less than 100,000 British pounds (OMS 2013). Most of the work is done by a loose network of volunteers. That’s a great model for OpenStreetMap, but part of the reason it works is because of the relatively modest scale of the data involved. Big Data involves larger organizations (and larger budgets), due to the scale of the computing power involved, as well as the longterm commitments necessary to providing reliable service, effective documentation, and support. All these things mean building a lasting organization. So while a loosely distributed model may be a great way to start such projects, over time they will need to transition to a more traditional not-for-profit model.

* How could not-for-profits help develop such a public data infrastructure?

At first sight, an encouraging sign is the flourishing ecosystem of opensource software. Ohloh , a site indexing open-source projects, currently lists more than 600,000 projects. Open-source projects such as Linux, Hadoop, and others are often leaders in their areas.

Given this ecosystem of open-source software, it’s somewhat puzzling that there is comparatively little public data infrastructure. Why has so much important code been made usable by anyone in the world, and so little data infrastructure?

To answer this question, it helps to think about the origin of opensource software. Open-source projects usually start in one of two ways: (1) as hobby projects (albeit often created by professional programmers in their spare time), such as Linux; or (2) as by-products of the work of for-profit companies. By looking at each of these cases separately, we can understand why open-source software has flourished so much more than public data infrastructure.

Let’s first consider the motivations for open-source software created by for-profit companies. An example is the Hadoop project, which was created by Yahoo as a way of making it easier to run programs across large clusters of computers. When for-profit companies open source projects in this way, it’s because they don’t view owning the code as part of their competitive business advantage. While running large clusterbased computations is obviously essential to Yahoo, they’re not trying to use that as their edge over other companies. And so it made sense for Yahoo to open-source Hadoop, so other people and organizations can help them improve the code.

By contrast, for many Internet companies owning their own data really is a core business advantage, and they are unlikely to open up their data infrastructure. A priori nothing says this necessarily has to be the case. A for-profit could attempt to build a business offering a powerful public data infrastructure, and find some competitive advantage other than owning the data (most likely, an advantage in logistics and supply chain management). But I believe that this hasn’t happened because holding data close is an easy and natural way for a company to maintain a competitive advantage. The investor Warren Buffet has described how successful companies need a moat—a competitive advantage that is truly difficult for other organizations to duplicate. For Google and Facebook and many other Internet companies their internal data infrastructure is their moat.

What about hobby projects? If projects such Linux can start as a hobby, then why don’t we see more public data infrastructure started as part of a hobby project? The problem is that creating data infrastructure requires a much greater commitment than creating open-source code. A hobby open-source project requires a time commitment, but little direct expenditure of money. It can be done on weekends, or in the evenings. As I noted already above, building effective data infrastructure requires time, money, and a long-term commitment to providing reliable service, effective documentation, and support. To do these things requires an organization that will be around for a long time. That’s a much bigger barrier to entry than in the case of open source.

What would be needed to create a healthy, vibrant ecology of not-forprofit organizations working on developing a public data infrastructure?

This question is too big to comprehensively answer in a short essay such as this. But I will briefly point out two significant obstacles to this happening through the traditional mechanisms for funding not-for-profits: foundations, grant agencies, and similar philanthropic sources.

To understand the first obstacle, consider the story of the for-profit company Ludicorp. In 2003 Ludicorp released an online game called Game Neverending. After releasing the game, Ludicorp added a feature for players to swap photos with one another. The programmers soon noticed that people were logging onto the game just to swap photos, and ignoring the actual gameplay. After observing this, they made a bold decision. They threw out the game, and relaunched a few weeks later as a photo-sharing service, which they named Flickr. Flickr went on to become the first major online photo-sharing application, and was eventually acquired by Yahoo. Although Flickr has faded since the acquisition, in its day it was one of the most beloved websites in the world.

Stories like this are so common in technology circles that there’s even a name for this phenomenon. Entrepreneurs talk about pivoting when they discover that some key assumption in their business model is wrong, and they need to try something else. Entrepreneur Steve Blank, one of the people who developed the concept of the pivot, has devised an influential definition of a startup as “an organization formed to search for a repeatable and scalable business model” (Blank 2010). When Ludicorp discovered that photo sharing was a scalable business in a way that Game Neverending wasn’t, they did the right thing: they pivoted hard.

This pattern of pivoting makes sense for entrepreneurs who are trying to create new technologies and new markets for those technologies. True innovators don’t start out knowing what will work; they discover what will work. And so their initial plans are almost certain to be wrong, and will need to change, perhaps radically.

The pivot has been understood and accepted by many technology investors. It’s expected and even encouraged that companies will change their mission, often radically, as they search for a scalable business model. But in the not-for-profit world this kind of change is verboten. Can you imagine a notfor- profit telling their funders—say, some big foundation—that they’ve decided to pivot? Perhaps they’ve decided that they’re no longer working with homeless youth, because they’ve discovered that their technology has a great application to the art scene. Such a change won’t look good on the end-of-year report! Yet, as the pivots behind Flickr and similar companies show, that kind of flexibility is an enormous aid (and arguably very nearly essential) in developing new technologies and new markets.

A second obstacle to funding not-for-profits working on a public data infrastructure is the risk-averse nature of much not-for-profit funding. In the for-profit world it’s understood that technology startups are extremely risky. Estimates of the risk vary, but typical estimates place the odds of failure for a startup at perhaps 70 to 80 percent (Gompers et al. 2008). Very few foundations or grant agencies would accept 70 to 80 percent odds of failure. It’s informative to consider entrepreneur Steve Blank’s startup biography. He bluntly states that his startups have made “two deep craters, several ‘base hits,’ [and] one massive ‘dot-com bubble’ home run” (Blank 2013). That is, he’s had two catastrophic failures, and one genuine success. In the for-profit startup world this can be bragged about; in the not-for-profit world this rate of success would be viewed as disastrous. The situation is compounded by the difficulty in defining what success is for a not-for-profit; this makes it tempting (and possible) for mediocre notfor-profits to scrape by, continuing to exist, when it would be healthier if they ceased to operate, and made space for more effective organizations.

One solution I’ve seen tried is for foundations and grant agencies to exhort applicants to take more risks. The problem is that any applicant considering taking those risks knows failure means they will still have trouble getting grants in the future, exhortation or no exhortation. So it still makes more sense to do low-risk work.

One possible resolution to this problem would be for not-for-profit funders to run failure audits. Suppose programs at the big foundations were audited for failures, and had to achieve a failure rate above a certain number. If a foundation were serious about taking risks, then they could run a deliberately high-risk grant program, where the program had to meet a target goal of at least 70 percent of projects failing. Doing this well would require careful design to avoid pitfalls. But if implemented well, the outcome would be a not-for-profit culture willing to take risks. At the moment, so far as I am aware, no large funder uses failure audits or any similar idea to encourage genuine risk taking.

I’ve painted a bleak picture of not-for-profit funding for a public data infrastructure (and for much other technology). But it’s not entirely bleak. Projects such as Wikipedia and OpenStreetMap have found ways to be successful, despite not being started with traditional funding. And I am optimistic that examples such as these will help inspire funders to adopt a more experimental and high-risk approach to funding technological innovation, an approach that will speed up the development of a powerful public data infrastructure.”

1 Comment Why a Public Data Infrastructure Should Be Developed by Not-for-Profits

  1. AvatarMike Riddell (@mikeriddell62)

    Turn ’empowering individuals’ into a SaaS business model that creates and monetises collective action, and redistributes it in proportion to contribution.

    The KPI that would account for the collective action and package it into a means of payment so that its value could be commercialised, would function just like money, as a measure of success and way of paying for things.

    The platform might be private, but the standards that govern issuance and regulations would be owned independently by the community so that they control the currency for the benefit of the community’s constituents. It just needs a commercial contract between the two entities to govern the amount that the platform service would cost to run – everything else would be repaid to the community by way of dividend.

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.