H

Horus - A Flexible Group Communication System

Paper Summary: R. V. Renesse, K. P. Birman, and S. Maffeis, “Horus: A Flexible Group Communication System“, Communications of the ACM, 1996

The authors present Horus, a modular and configurable group communication system which is designed for fault tolerant distributed applications. Horus addresses the limitations caused by inflexibility and monolithic nature of communication systems like ISIS. It does so by introducing a stackable protocol architecture, where users can select and compose different communication “blocks”, each block providing guarantees on ordering, reliability, failure detection, etc. These blocks are modular and can operate almost independently. Similar to the layers of Network architecture, each block passes messages down/upwards by wrapping/unwrapping it with extra information based on the operation performed. If necessary, layers can also drop/buffer messages for delayed delivery. Thus, these lightweight blocks allow applications to compose a customized stack based on their specific requirements. For instance, applications can now include only reliable message delivery and remove total ordering guarantees. Such customizability allows for greater flexibility and opportunities to reduce overhead in the application. Horus also includes tools to assist in the development and debugging of new layers.

Horus stacks are carefully shielded from each other and have their own threads and memory, each of which are provided by thread and memory schedulers. For blocks in a stack to fit together, all blocks support the Horus Common Protocol Interface (HCPI). The blocks still run different protocols and have different properties, but can “fit” well as they have the same upstream and downstream interfaces.

The authors provide an example of a video service to which Horus is used to add fault tolerance and multicasting capabilities with little extra code. In order to introduce Horus functionality in a more transparent way, they introduce Electra, a system that integrates Horus with CORBA (Common Object Request Broker Architecture). The Electra framework uses Horus to add fault tolerance and group communication capabilities to CORBA applications, demonstrating Horus’ adaptability. Group communication is provided by updating group views as members joining, leaving, or failing, ensuring that applications can seamlessly adapt to changes. Messages within a group remain isolated from unauthorized entities, providing privacy and resilience. The authors also present experimental results demonstrating that Horus achieves lower message latency and better throughput than traditional group communication systems.

The performance evaluation of Horus shows its efficiency despite the concerns of layering overhead. Tested on a system of Sun Sparc10 workstations over Ethernet, Horus demonstrated low message latency and high throughput across different configurations. Using unordered virtual synchrony, Horus achieved low latency, which improved further over ATM. Hardware multicast proved especially beneficial for larger messages, while totally ordered communication introduced some inefficiencies under high concurrency. As the number of group members increases, overall throughput improves. This is due to better concurrency, more efficient message batching, and built-in flow control mechanisms.

Horus is a highly flexible and efficient architecture. It is modular, customizable, secure, and fault tolerant. However, the flexibility of Horus comes at the cost of increased configuration overhead. Users must carefully design their protocol stack, which may require deep system knowledge. Overhead is also added due to messages moving between multiple protocol layers, again similar to network architecture. It is also not clear how Horus performs against other systems apart from ISIS. Horus’ wins and complications show that a fine balance may need to be maintained between overhead and customization/reliability and other requirements.