H

Impossibility of Distributed Consensus with One Faulty Process

Paper Summary: M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process“, Journal of ACM, 1985

The authors introduce one of the most fundamental results in distributed computing, establishing what is now known as the FLP impossibility theorem. They prove that in an asynchronous distributed system, no deterministic algorithm can guarantee consensus if there is even a single process that may fail by crashing.

A consensus problem is one where a set of distributed processes must agree on a single value despite potential failures. A correct consensus protocol must satisfy three key properties: agreement, meaning all non-faulty processes must reach the same decision; validity, ensuring the chosen value is one of the original proposals; and termination, requiring that every non-faulty process eventually makes a decision. The authors assume a fully asynchronous model where there are no bounds on message delivery times, and processes execute at arbitrary speeds while communicating exclusively through message passing.

The impossibility proof is constructed by showing that an adversary can always delay messages in a way that prevents the system from reaching a conclusive decision. The authors analyze an initial configuration where multiple decisions are possible and construct an execution path where no process ever gathers enough information to make an irreversible decision. Their proof relies on the concept of indistinguishable executions, where delays in message delivery create uncertainty about the state of other processes. Since no process has complete information about failures or the decisions of others, the adversary can always extend an undecided execution indefinitely, thereby preventing termination. This result has several critical implications. It establishes that in purely asynchronous environments, reliable consensus is fundamentally unattainable, which would force system designers to explore alternative approaches. One major impact of the FLP theorem has been the shift toward algorithms that either introduce randomization, such as Ben-Or’s and Rabin’s randomized consensus algorithms, or assume partial synchrony, as seen in Paxos and Raft. These approaches sidestep the FLP impossibility by introducing probabilistic guarantees or leveraging periods of network stability to make progress.

Despite its significance, the FLP result has some limitations. The impossibility applies only to deterministic algorithms. Randomized consensus protocols where processes make decisions based on coin flips can still achieve probabilistic termination. Additionally, the theorem assumes a purely asynchronous model, whereas many real-world distributed systems exhibit partial synchrony, where message delays may be bounded for extended periods, allowing consensus protocols to function effectively. Furthermore, the result focuses exclusively on crash failures, whereas Byzantine failures, where processes may act maliciously, require even more robust protocols like Practical Byzantine Fault Tolerance (PBFT). However, the FLP impossibility theorem remains an important part of distributed systems theory, shaping research on consensus and fault tolerance. While its impossibility result imposes theoretical constraints, it has inspired practical solutions that balance performance, reliability, and efficiency. The paper’s insights continue to be relevant in modern computing environments, including cloud infrastructure, fault-tolerant databases, and blockchain consensus mechanisms.