Questions tagged [consensus]

Consensus is the problem of reaching agreement among members of a group. Talking in terms of computing it is the agreement on a certain value that is needed for during computation by nodes that participate in a cluster. Reaching consensus in a distributed environment is a challenging task and under certain conditions not even possible. Consensus algorithms are often used for replicating a state machine as a general approach for enhancing fault-tolerance

In the context of distributed computing consensus is a fundamental problem that has received intense attention in research as problem like atomic compare-and-swap (CAS) registers or atomic transaction commit can be reduced to it. Furthermore consensus builds a foundation to realise applications like leader election, clock synchronisation and state machine replication. Although the problem can be explained in simple words, solving it is far more subtle and under certain circumstances not even possible.

In general consensus is the problem of reaching agreement among members of a group. Talking in terms of computing, it is the agreement on a certain value that is needed for during computation by nodes that participate in a cluster. This may not sound complicated but indeed gets as in a distributed system different types of unpredictable partial failures like:

  • loss of packets
  • duplication/reordering of packets
  • arbitrary delayed packet delivery
  • pause or even crash of a node

may happen and will happen, which impede solving the problem. Therefore a consensus algorithm needs to satisfy certain properties:

  • Agreement: All decisions by non faulty processors must be the same.
  • Integrity: No node decides twice.
  • Validity: If all nodes decide on value v, then v was proposed.
  • Termination: Every non-faulty node needs to eventually decide on a value.

Termination is a liveness property and requires that the algorithm is fault-tolerant. Without satisfying this property an algorithm could simply use a predefined leader that decides on values (like a 2PC). But if the leader fails the algorithm would not make progress anymore - hence the termination requirement. Agreement and integrity are safety properties and build the core of the consensus algorithm. Validity covers the trivial solution where nodes would always decide on the same pre given value and thus also reach consensus by definition.

Whether a solution, that is an actual algorithm, exist for the consensus problem, depends on the system model. Therefore one has to differentiate between a synchronous or asynchronous system model1. Since mostly, in order to send messages, we have to rely on a shared communication channel, at least the communication is asynchronous and thus our real world applications fit that model better. Unfortunately in such kind of environment there is an impossibility proof by Fischer et al. (1), known as the FLP Impossibility, that shows in a fully asynchronous model even if just one node is faulty (not even considering byzantine failures), no algorithm exist that always reaches consensus. By proving that in every possible algorithm an execution exist that would never terminate, they have shown that no algorithm would always reach the point of agreement.

Nevertheless real solutions to this problem exist. Note that the impossibility result applies on fully asynchronous environments. Although our real world applications fit that model better, it does not mean all conditions are accurate (e.g processors do maintain an internal clock). Making less stringent assumptions about the model allows circumventing the result of the FLP Impossibility. Setting for example an upper bound for message delivery as an unreliable failure detection can turn the communication channel into a partial synchronous one. A system model in which the consensus problem is solvable. (2)

Algorithms that solve the consensus problem are: Paxos, Raft, Viewstamped Replication, Zab


1 In synchronous models there are known bounds for message delivery and process execution. In an asynchronous model processors not even maintain an internal clock and hence no bounds exist.

303 questions
31
votes
8 answers

Contradiction in Lamport's Paxos made simple paper

Phase 2. (a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the…
lambda
  • 585
  • 6
  • 10
28
votes
5 answers

When to use Paxos (real practical use cases)?

Could someone give me a list of real use cases of Paxos. That is real problems that require consensus as part of a bigger problem. Is the following a use case of Paxos? Suppose there are two clients playing poker against each other on a poker…
user782220
  • 10,677
  • 21
  • 72
  • 135
28
votes
3 answers

paxos vs raft for leader election

After reading paxos and raft paper, I have following confusion: paxos paper only describe consensus on single log entry, which is equivalent the leader election part of the raft algorithm. What's the advantage of paxos's approach over the simple…
Oliver Young
  • 578
  • 1
  • 4
  • 12
20
votes
1 answer

In Substrate, What is the difference between Babe, Aura, and Grandpa

Substrate supports "pluggable consensus" so a developer can choose from among several consensus algorithms. It comes standard with four algorithms: Aura Babe Proof of Work Grandpa Some of these (eg babe and grandpa) can even be used together in a…
JoshOrndorff
  • 1,591
  • 9
  • 19
20
votes
2 answers

How does kafka handle network partitions?

Kafka has the concept of a in-sync replica set, which is the set of nodes that aren't too far behind the leader. What happens if the network cleanly partitions so that a minority containing the leader is on one side, and a majority containing the…
Filip Haglund
  • 13,919
  • 13
  • 64
  • 113
19
votes
10 answers

What are the faster Paxos-related algorithms for consensus in distributed systems?

I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems?
Rob Lachlan
  • 14,289
  • 5
  • 49
  • 99
18
votes
4 answers

How does the Raft algorithm guarantee consensus if there are multiple leaders?

As the paper says: Election Safety: at most one leader can be elected in a given term. §5.2 However, there may be more than one leader in the system. Raft only can promise that there is only one leader in a given term. So If I have more than one…
baotiao
  • 775
  • 6
  • 20
16
votes
1 answer

Raft Vs MongoDB Primary Election

How is the raft consensus algorithm different from MongoDB's primary election process other than the fact that MongoDB takes other factors (priority, for example) into consideration while electing the primary?
Chandra Sekar
  • 10,683
  • 3
  • 39
  • 54
15
votes
1 answer

How does Parity's Aura consensus protocol work?

Here it's a very high level description with only formulas. I want to understand actually how it works. I don't actually understand what a step is and what's it's use? Does a node always keep updating the step? And when time to create to create and…
Narayan Prusty
  • 2,501
  • 3
  • 22
  • 41
14
votes
3 answers

Real world example of Paxos

Can someone give me a real-world example of how Paxos algorithm is used in a distributed database? I have read many papers on Paxos that explain the algorithm but none of them really explain with an actual example. A simple example could be a…
sjg
  • 141
  • 1
  • 3
13
votes
2 answers

Why is Paxos leader election not done using Paxos?

The questions below are intended to be serious rather than frivolous. I lack experience in distributed systems, but I do understand how Basic Paxos works and why leader selection is useful. Unfortunately, my understanding is not deep enough to…
merlin2011
  • 71,677
  • 44
  • 195
  • 329
12
votes
3 answers

Existence of a 0- and 1-valent configurations in the proof of FLP impossibility result

In the known paper Impossibility of Distributed Consensus with one Faulty Process (JACM85), FLP (Fisher, Lynch and Paterson) proved the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced…
hengxin
  • 1,867
  • 2
  • 21
  • 42
9
votes
3 answers

Is operation in raft log entry supposed to be idempotent?

In raft, when a node restart, it try to redo all the log entries to catch up the state. But if node goes down again in recovery phase, node would do some op twice. These twice redo op will violate state machine if ops are not idempotent. According…
smxxqjl
  • 91
  • 4
9
votes
5 answers

How does raft handle committing entries from previous one?

In raft paper section 5.4.2 If a leader crashes before committing an entry, future leaders will attempt to finish replicating the entry. However, a leader cannot immediately conclude that an entry from a previous term is committed once it…
Jal
  • 2,174
  • 1
  • 18
  • 37
8
votes
4 answers

PBFT: Why cant the replicas perform the request after 2/3 have prepared? why do we need commit phase?

I know there are some questions on this website that asks the same questions. However the answer is never clear: In PBFT, why cant the replicas execute the requests after 2/3s have prepared? why is commit phase needed? if 2/3 + 1 replica have agreed…
user2584960
  • 645
  • 1
  • 6
  • 20
1
2 3
20 21