I am currently implementing the Raft consensus algorithm myself, and I meet with the following problem.
Consider there are 4 nodes(A, B, C and D), so a log entry can be committed with more than 2 votes. Now we start the cluster and have Leader A elected with term = 0
. Then the following things happen:
- Follower B/D disconnect.
- Leader A create
LogEntry
X. - Leader A try to replicate to all nodes and fails eventually because only 2 nodes(A and C).
- Node B reconnect and timeout, it starts a election with new
term = 1
. - Node A lost its leadership, because it received Node B's
RequestVote
RPC. - Node B can't win the election, because it has no
LogEntry
X. So there are no Leader in the cluster. - Node A timeout and be elected as Leader again.
- Leader A successfully replicate
LogEntry
X to B. Now node A/B/C have exactly the same
LogEntry
X, which is(index = 0, term = 0)
. However, according to the Raft paper, Leader A can't commit X, though it's generated by itself and a majority agreed on X.Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas;
- Suppose there are no more
LogEntry
s from client to replicate, soLogEntry
X remains uncommitted.
My questions are:
- Is this a real problem?
- Are there some solutions to this? In fact there are already some posts over SoF which emphasize the importance of this rule. And in this post, it seems to say we can create a copy Y of X, and replicate Y. Does it work or maybe there exists a common solution?