Modeling and performance analysis of BitTorrent-like peer-to-peer networks
Paper Summary: “Modeling and performance analysis of BitTorrent-like peer-to-peer networks”, Dongyu Qiu and R. Srikant, Proceedings of ACM Sigcomm, 2004.
The authors develop and present simple fluid models to study the performance of Bittorrent, fit for an era where P2P systems seemed like they would take over the internet. BitTorrent has been a popular P2P application since its conception, and this paper is an attempt to study its performance using criteria like peer evolution, scalability, file sharing efficiency; and also evaluate BitTorrent’s incentive mechanisms to address free-riding.
The work in this paper is inspired by the development of simple Markovian models to study steady-state properties, whereas here the authors develop simple deterministic and stochastic fluid models to gain insight into the performance of BitTorrent. A brief discussion of BitTorrent is provided, followed by the fluid models and behaviours of users under the incentive mechanism of BitTorrent (optimistic unchoking) modeled using game theory. The authors use quantities like number of downloaders, number of seeds, arrival rate of new requests, uploading and downloading bandwidths of peers, rates of departure of seeds and downloaders to capture the mechanism of BitTorrent. They develop equations to analyse the steady state behaviour of BitTorrent, gaining insights into the system.
The equations lead the authors to conclude results such as the average time not being related to the request arrival rate, implying that BitTorrent scales very well. They also study the stability of their fluid model around the assumed equilibrium, concluding that the number of seeds and downloaders at steady state follows a Gaussian distribution. Analysing the behaviour of peers under the optimistic unchoking mechanism of BitTorrent also leads the authors to conclude that while there is no Nash equilibrium for a general network setting, if the network consists has peer groups with similar upload and download bandwidths, a Nash equilibrium is possible. They also prove that a free rider under this mechanism will be limited to a small fraction of the normal downloading rate they can achieve if they cooperate, demonstrating the success of the optimistic unchoking mechanism.
Experimental results show that the fluid models developed by the authors follow the observed data quite closely, even near the beginning of the system evolution. The results indicate again that the average download time remains constant, as the number of downloaders increases linearly with peer arrival rate, showing that the system does scale very well. Another experiment introduces a file into the BitTorrent network and analyses the behaviour based on the log files collected from the tracker. These set of results show that the peer arrival and departure rates are time-varying, as opposed to the constant assumptions taken by the authors.
As indicated before, some of the limitations of the study include assumptions like simple upload/download capacities. The authors also sacrifice some protocol details like the rarest-first approach in BitTorrent, limiting realism. In addition, the equilibrium analysis presumes complete information, whereas actual peers operate on partial or outdated knowledge. While improvements can be made to address these concerns, including incorporating heterogeneity and applying reinforcement learning to model peer strategies, this work provides a solid foundation for further work in this area. Being able to model a system that was designed but with unknown predictability provides system designers with insight into how the mechanism of P2P systems like BitTorrent actually operate, allowing us to model possible outcomes, predict evolutions and find previously undiscovered flaws. Paper Summary: “Making Gnutella-like P2P Systems Scalable”, Y. Chawathe, S. Ratnaswamy, L. Breslau, N. Lanham, S. Shenker, ACM Sigcomm 2003
The authors propose Gia, a new system formed by improvements to Gnutella like P2P systems that addresses their scalability problem. After Napster, which followed a centralized search facility for P2P file sharing processes; was eventually shut down for malpractice, systems utilizing decentralized search just as Gnutella took precedence. Here, instead of sending file queries to a central servers, the queries are more or less flooded across the system. Peers matching the query sends the related content to the originator node. This means that the load on each node grows linearly with system size, making the approach unscalable.
The authors attempt to resolve this problem being inspired by the idea of supernodes, ie., nodes that have better bandwidth. On a similar note, the ideas in this paper utilise the heterogeneous nature of the system of nodes, which preserves the simplicity of Gnutella while improving the scalability. Gia introduces a few novel contributions which work together to improve system performance. The authors steer away from the then popular approach of using Distributed Hash Tables (DHTs) as DHTs do not sustain their logarithmic lookup time when nodes can leave the system or join randomly. In addition, DHTs also perform poorly when there are no exact match queries, which is the case in P2P file system queries. Most download queries are also for well-replicated files, which can be used for a performance advantage in P2P systems but is not possible with DHTs. The approach taken by Gia is as follows:
(1)The authors exploit heterogeneity for Gia by using a topology adaptation protocol to ensure that high capacity nodes are very well connected and low capacity nodes are in close proximity to the high capacity nodes, They do this by computing a quantity called level of satisfaction (S) for each node, which starts at S=0 for a poorly connected “dissatisfied” node and works its way to S=1, ensuring that the system reaches a stable state. (2) Flow control is achieved by using a token based mechanism which allows senders to direct queries to a node only if the node allows it to. A client sends out tokens to its neighbours indicating its willingness to accept queries. These tokens are assigned more to higher capacity nodes, which encourage neighbour nodes to advertise their true capacities (incentive mechanism). (3) Gia also uses one-hop replication, which ensures that each node contains pointers to its neighbours’ content in addition to its own. This one-hop replication improves the efficiency of the search, as high capacity nodes (which are well connected because of the topology adaptation process) have more information with them and can answer a higher number of queries. (4) Gia uses a search protocol based on biased random walks instead of random walks or flooding like Gnutells, which ensures that queries are directed towards higher capacity nodes which can answer queries better due to topology adaptation and one-hop replication.
As is evident from the above, the improvements in Gia are all interconnected and together improve its performance over Gnutella, as indicated by the results. The authors use a collapse point (CP) as the query rate beyond which success rate of queries drops, reflecting total system capacity. They find that the CP for Gia is 3 to 5 orders of magnitude higher than systems like Gnutella which use flooding or even random walk based approaches. Factor analysis performed also proves that it is the combination of the various improvements that contribute to Gia’s effectiveness. Gia is also more robust.
Gia has limitations in that its biased search mechanism still depends on partial or local knowledge, which may be outdated due to high rates of nodes leaving/joining the system. While the system is robust to node failures and dynamic membership, the overhead of its improvements such as topology adaptation, neighbor evaluations, and token exchanges isn’t negligible in real-world bandwidth-limited environments. Also, while one-hop replication improves search efficiency, it could strain memory or storage on lightweight nodes. With scope for improvements, Gia still remains a promising deployment in real world applications. It reinforces the concepts of simplicity and purpose-driven intentional architecture, evidenced by its superior performance compared to Gnutella-like systems. The authors provide a good defense of the DHT approach not taken, and also advocate for better investigation into the centralised search strategy of Napster, hoping to support file sharing in the same vein as the web.