I’ve had a read of ‘Robust vote sampling in a P2P media distribution system‘. It is a very interesting research paper. The idea that grabs me most is the one about using gossip-based network systems to decentralise metadata. Let me explain; if you think of a book on Amazon, it comes with lots of additional information as well as the basic item itself – user reviews and ratings, items that might be similar and so on. This is all great information that can often help guide you as a user in your choice. But this data is centralised – it is on Amazon’s servers and it is up to them what they do with it. If you are in the process of trying to decentralise a system – so it is not under the control of one central person or server – how can you collate comments and additional data?
The authors have come up with a very simple and elegant method. First off the use the Gossip-protocol as a means to an end (so named as it mimics how gossip spreads in a social situation);
In order to propagate and store metadata we selected a gossip (or epidemic) based replication approach. Each peer stores metadata in its own local database. By storing metadata locally we ensure that it has high availability. Periodically peers are paired randomly and exchange metadata updating their own local databases … We selected a gossip based design because it requires no central components and is robust to high churn rates. We could have stored metadata in a Distributed Hash Table but these require explicit leave and join operations which are costly in systems with high churn, such as file sharing networks. Additionally, search performance is considerably enhanced if metadata is stored locally because it is not necessary to perform multi-hop look-ups.
Then they add a second layer of interactive functionality to this; users are required to vote on whether or not they like the comments. This means that a user, in order to get the benefit from the cloud of comments, must act as a kind of screening to the data-sources;
Moderations are disseminated in a gossip-like fashion to other peers by using the PSS [peer sampling service; the means by which nodes to discover others and potentially exchange messages with them]. However, nodes only pass on metadata from those moderators they have approved. Approval involves the user explicitly selecting a thumbs-up icon displayed next to the metadata from the given moderator indicating a positive (+) vote for the moderator. Users may also disapprove of a moderator by selecting a thumbs-down indicating a negative (-) vote. Essentially then, the idea is that, “good” moderators, as judged by the approval of others, will spread their metadata quickly but “bad” moderators, obtaining low numbers of approvals and / or disapprovals, will only be able to spread their metadata slowly.
Voting approval on comments is nothing new. You can see it in action on a huge range of sites from The Guardian to Slashdot.org – it is a good way of user-comment regulation. (For anyone who’s ever run a busy public site, you know how messy comment regulation can get!). What is interesting about this system is that the comments the user sees are filtered to their individual choice and taste, unlike a public system that just shows the popular ones. This means that over time, the comments you get will grow to your taste. The problem with the current centralised public system (which this new idea avoids) is that when it comes to contentious issues comments can often we rated by polarisation – where user back commentators based on not the accuracy of their comments, but of their bias on a larger issue.
What is also clever about this design is that it still allows for the ‘wisdom of the crowds’, even when the cache of data you have is unique to each user. It does this using what they authors term the ‘local ballot box’;
Essentially then, each peer individually conducts its own poll by asking other randomly selected peers directly to supply their local vote list. Hence pairs of peers meet randomly and exchange votes, building, over time, a sample of the votes of the population in their local ballot boxes. Nodes do not forward or share the accumulated information in their local ballot box with other peers. This precludes certain kinds of malicious vote manipulation where a node could lie about the votes received from others. But this means that each peer can only accumulate a sample of the population votes, based on its direct experience, not a globally accurate total count.
There are lots of interesting ideas and designs in the paper and it is worth a read.
(First published on the Catblog.)