Building Torrent Family Trees (Beta)

(This research was presented to eComm 2009 in Amsterdam and also posted on the my blog.)

We are used to seeing family trees in biology.  For example this is the human family tree:

Biology family tree

Biology family tree

But how about for software?  I think it is possible for us to look at family tree of digital media too:  So here is what the family tree for BitTorrent software looks like:

Bittorrent family tree

Bittorrent family tree

So what does this image mean?  The first key point to note here is that this is not a family tree of the idea of torrents (we’ll call that a ‘meme‘) but of the actual source-code relationships between BitTorrent and subsequent software clients based on it.

In biology, relationships between species can be determined using phylogenetic analysis – where the evolutionary relatedness among various groups of organisms are determined via molecular sequencing data and morphological data matrices.  These linkages can then be plotted onto a phylogenetic tree.  The branching structure is used because evolution is a branching process, whereby alteration over time can result in speciation and thus branching of populations. As species hybridize or terminate (extinction), the results can be visualized in a phylogenetic tree.  This is a strong similarity to the methodological approach being used here; by plotting the generations of a species of p2p software along with where the off-shoot branched from the common ancestor (the code-basis).

The key reason that this is akin to the  phylogenetic method is because the linkages are based on relationships of the source code (read: DNA) and not upon the meme-layer (read:idea). A similar exercise but around the meme-layer would produce very different results.

So how was the image generated?  I am interested in the change over time of software systems.  To get more of an overview of the world of p2p software I thought it would be interesting to see the changes over time as a whole.  So with this in mind I looked to the releases of versions of each and every p2p torrent client.  This has been done using the following methodology;

  • Create an entry for each type of torrent client software.  For this experiment, only separate software systems designed to be installed on a operating system were used. This research did not include non-installed software such as browser clients like BitLet.org or mobile phone versions of software.
  • For that client, search for the changelog, if not possible then look for the date of the source code releases and if this cannot be found, then the executable releases or news and/or email list announcements.  If multiple dates were given for the same release version, then the date of the source code (.tar) files were used as the primary source.
  • On a per-month basis, record the most current version released in that month. The record is in the form of the version number given by the developers, abbreviated to one decimal place (rounded down always).  am aware that this is a self-reported piece of data, and it should not be considered that either within that project or in comparison to other project, there is enough consistency to consider this number an empirical item of data, however it does provide a numerical record of generational change.  See detailed notes at the end of this post*.
  • Where is is noted in the documents read for research, record what other source code was used in the construction of the project – this gives us a sense of the linkages between projects (sub-species) and allows us to construct a family tree.  This is generally recorded by the developer in their notes.  The beta data-set, including reference links t where the data came from, is available as a spreadsheet here.
  • Enter this data into a graph plotting version number over time.  This gives us a broad view of p2p software releases over time.
  • Then plot the main linkages between the different releases by their basis source code, e.g. Tomato Torrent is based on v4.2 of BitTorrent – this is recorded as a family linkage.  Over the graph, draw lines to connect the ‘off-shoot’ software to their ‘code-basis’.

This method produces the following results (the graph is pretty big, so if you want to see it in more detail, look at the PDF)

Graph of p2p client versions over time

Graph of p2p client versions over time

So when the family linkages of the main progenitor software (which the data showed to be Azureus, BitTorrent and LibTorrent) are plotted onto this graph we can see:

Graph of p2p client versions with linkages over time

Graph of p2p client versions with linkages over time

Which can then be separated out from the graph to show each family tree in isolation…taking us back to the Bittorrent family tree we started with.

More info can be found on my blog.

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.