At http://esr.ibiblio.org/?p=1282 Eric S. Raymond writes:
The worst problem with almost all current hosting sites is that they’re data jails. You can put data (the source code revision history, mailing list address lists, bug reports) into them, but getting a complete snapshot of that data back out often ranges from painful to impossible.
Why is this an issue? Very practically, because hosting sites, even well-established ones, sometimes go off the air. Any prudent project lead should be thinking about how to recover if that happens, and how to take periodic backups of critical project data. But more generally, it’s your data. You should own it. If you can’t push a button and get a snapshot of your project state out of the site whenever you want, you don’t own it.
When berlios.de crashed on me, I was lucky; I had been preparing to migrate GPSD off the site due to deteriorating performance; I had a Subversion dump file that was less than two weeks old. I was able to bring that up to date by translating commits from an unofficial git mirror. I was doubly lucky in that the Mailman adminstrative (sic) pages remained accessible even when the project webspace and repositories had been 404 for two days.
But actually retrieving my mailing-list data was a hideous process that involved screen-scraping HTML by hand, and I had no hope at all of retrieving the bug tracker state.
This anecdote illustrates the most serious manifestations of the data-jail problem. Third-generation version-control (hg, git, bzr, etc.) systems pretty much solve it for code repositories; every checkout is a mirror. But most projects have two other critical data collections: their mailing-list state and their bug-tracker state. And, on all sites I know of in late 2009, those are seriously jailed.
This is a problem that goes straight to the design of the software subsystems used by these sites. Some are generic: of these, the most frequent single offender is 2.x versions of Mailman, the most widely used mailing-list manager (the Mailman maintainers claim to have fixed this in 3.0). Bug-trackers tend to be tightly tied to individual hosting engines, and are even harder to dig data out of.
Eric acknowledges that distributed revision control solves the problem of the code repository being a “data jail”. My opinion is that the other problems are solved by extremely low cost hosting of your own email lists (many shared hosting providers offers GNUMailman lists for 5-10 per month) plus, hosting your own distributed bug tracking via tools like http://bugseverywhere.org/be/show/HomePage
It’s my opinion that the building blocks (and more than 90% of the solutions) exist to route around the blockages caused by “forge lock-in”. The distribution of communication could be done via http://openmicroblogging.org/protocol/0.1/ which could allow people to post to development discussions from almost anywhere online, and have the messages tracked and linked to via microblog. This would obsolete the need for email discussion of development altogether (a change I would fully welcome). this could also synchronize with discussions happening in IRC channels (where most developers now actually discuss development these days anyway) . Tools already exist to connect IRC with asynch online discussion.
The conclusions that I draw:
- “forge” sites have obsoleted themselves to being anything other than a convenient mirror for project release files
- It’s more important to me to focus my time and energy on ways to route around the blockage, than to decry the blockage, especially when it’s now 100% possible and affordable for people with even meager resources to route around
- Most importantly: many of the problems that are a concern (not just with “forge lock-in”, but also with data lockin from social media websites) can be easily solved now with distributed solutions. The first question you should ask is: “how will my activities and data interoperate with others?“, and NOT “how will this best work for me?” if you concentrate on how your approach will interoperate best with others, there is still room to address how it will work best for you. But, if you only concetrate on how your approach will work best for you, you’ll miss opportunities by ignoring interoperability. **INTEROPERABILITY IS THE FIRST PRIORITY** if you want to avoid the later problems of “data jails”, “data lockin” etc. Invest now in infrastructure that allows for expansion, connection with others in a plurality of ways, and allows for as much distribution of infrastructure as possible. This investment will return to you exponentially, as you have chosen infrastructure that is permenantly adaptable to new standards, new pressures etc
For anyone, not just developers, but also for people discussing problems such as the problems with “crowdsourcing” and “locked social media” discussed frequently at https://mailman.thing.net/mailman/listinfo/idc : the choices that *you* make, are what locks you in to the systems that you see are constraining you. There are other choices that you can make now, and they are worth the investment, but many are not making these choices.
The same is true for how you structure collaboration. And/or, how you participate in collaborative processes, how your “surplus labor” is used. In all cases, before jumping in and using services run by companies who’s primary and sometimes even legally required focus is to seek monetary profit: spend the time looking at what the plausible alternatives are, and design for interoperability and adaptability. Give your money and resources to those people that already exist that will give you solutions that won’t lock you in, and that will let you adapt over time. Don’t always opt for the instant gratification choice. There are people out there who can and will capitalize on your need for instant gratification, then complain after the fact that the choice you make made it harder for you to adapt and sustain your activities over time. If you change your focus from “what is best for me now” to “how I can make this activity as interoperable and flexibly adaptable as possible”, you will not have the dilemma of having to deal with “lock in” or leaching of “surplus labor” problems