P2P Foundation

Researching, documenting and promoting peer to peer practices


    Admin

    P2P Foundation Sites/Publications

    Worth Reading

    Introductory Essay
    Extensive Essay

    Sponsors

    Interviews

    Video

    - New P2P Video at Pixelace, Helsinki, March 2009

    Podcasts

    - Interview at Open Views by Sundar Raman, 9th March 2007
    - Interview with Richard Poynder

    Resources

    Delicious P2P tags
    P2P Blog Aggregator
    P2P Encyclopedia
    P2P Foundation Wiki
    P2P Meme Map
    P2P Movements
    P2P Podcasts
    P2P Tools
    P2P Topical Index
    P2P Webcasts
    givegetnation

    Visit our archive

  • Books


    Free Software, Free Society

    Community

    Join the P2P Community on Frappr frappr link to our community

    Want to advertise? Click here.

  • Subscribe



  • Donate

    If you value the insight and content of this site, gift us with a contribution.

  • Communities and Networks Connection
  • Recent Comments:

    • rachel: well its true that kens work is great, and some of these critics are downright...
    • Michel Bauwens: This contribution from Jeff Vail, sent by email, is not directly...
    • Michel Bauwens: Commentary by Eric Hunting, via email: Apple has, ironically, always...
    • Michel Bauwens: Feel free to post it as a comment, and we can upgrade it to an article,...
    • laws and liberty: Lol, I wouldn’t have said that personally, but this is a very...

  • Authors

  • Measuring p2p Networks (Hint; It’s not easy!)

    photo of tomas

    tomas
    20th November 2009


    There have been recent reports about the supposed decline in p2p traffic and also talked about the difficulties in measuring p2p networks.  There is an interesting paper by Stutzbach et al entitled ‘On Unbiased Sampling for Unstructured Peer-to-Peer Networks‘ - now the paper has some fairly technical bits in it, but you can still get a good guide as to how difficult this whole area is.  For starters the authors dip into sociology to look at the issues with getting to grips with ‘hidden’ populations.  Hidden populations are ones that we don’t really know the boundaries or size of the population and those within it prefer to remain anonymous.  A classic example is the population of drug users.  The authors pick one method to draw from - respondent-driven sampling - which is interesting because the method for this is p2p-like within its own structure; you start with a small seed of respondents then you get the respondents to identify more respondents and so on.

    Gathering unbiased sample data of a p2p network is hard.  The current methods have bias within them.  The authors look to sampling tools that are based around a static system and adapt it to work with a dynamic network.  As p2p networks are very dynamic (they tend to have a few peers have long sessions while the majority have very short sessions), there is a problem sampling.  The authors illustrate this with an example;

    “Suppose we wish to observe the number of files shared by peers. In this example system, half the peers are up all the time and have many files, while the other peers remain for around 1 minute and are immediately replaced by new short-lived peers who have few files. The technique used by most studies would observe the system for a long time and incorrectly conclude that most of the peers in the system have very few files.”

    What they are saying is that if you discount the long-standing peers from being re-sampled, as they have already been covered once, then in each snap-shot of the system you sample, it will contain more and more short-lived peers, and so mess the results up.  (Though it seems to me all this assumes a large population relative to the sample size, with most p2p networks are.) Thus the problem also presents the solution; the authors used a method that allows the same peer to be sampled at different points in time - they decoupled the sample from the session lengths:

    “[O]ur approach will correctly select long-lived peers half the time and short-lived peers half the time. When the samples are examined, they will show that half of the peers in the system at any given moment have many files while half of the peers have few files, which is exactly correct.”

    Which addressed one of the issues with sampling - but how did they adapt the static methods to a dynamic environment?  A clever adaptation whereby they introduce backtracking into an the existing methodology of Metropolized Random Walk - and not surprisingly call the result, Metropolized Random Walk with Backtracking.  Here’s how it works:

    “We make an adaptation by maintaining a stack of visited peers. When the walk chooses a new peer to query, we push the peer’s address on the stack. If the query times out, we pop the address off the stack, and choose a new neighbour of the peer that is now on top of the stack. If all of a peer’s neighbours time out, we re-query that peer to get a fresh list of its neighbours. If the re-query also times out, we pop that peer from the stack as well, and so on. If the stack underflows, we consider the walk a failure. We do not count timed-out peers as a hop for the purposes of measuring the length of the walk.”

    The authors then go on to present a pretty robust analysis of their method in action.  In all the paper is both an interesting account of the difficulties of getting good data on p2p networks coupled with some inventive solutions to the problems therein.

    Disclosure note: the authors of the paper cite research as being supported by National Science Foundation and Cisco Systems.    This article was first published on the Catblog.

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>