2

If we are running multi-paxos then a node may see:

Propose(N)
Accept!(N,Vn)
Accept!(N+1,Vm)
Accept!(N+4,Vo) // huh? where is +2, +3?
Accept!(N+5,Vp)

This may be because either:

  • There was a stable leader but the network local to this node dropped else delayed +2 and +3.
  • There was an outage such that there were two attempts to propose such that +2 and +3 were failed rounds proposals

In general operations on the distributed finite state machine wont commute such that a node should apply all operations in order. This implies that a node needs to be able to distinguish between the two cases. If it is failed rounds of proposals the node has no problems. If it is lost messages it suggests that the node should wait till they turn up else try to recover the lost data (e.g. request a snapshot to reinitialise and catchup).

What are the options or strategies to handle this and what overhead do they create?

This question is inspired by In Paxos, can an Acceptor accept a different value after it has already accepted one?

Community
  • 1
  • 1
simbo1905
  • 6,321
  • 5
  • 58
  • 86

1 Answers1

1

I can think of two methods to deal with this.

The simplest approach would be to have the node that is missing +2 and +3 to go back and try to propose no-ops in those slots. If there were decisions there, the node will learn the data in the prepare round. Otherwise, no-ops will be decided.

Another approach would be to have an out-of-band re-learning process. This may be necessary anyway: how does a node catch up if it joins the system after the others?

Or you can use a combination of both. The leader can propose no-ops for any holes in its history, the others can use the re-learning process. This is how my paxos system works.

Michael Deardeuff
  • 10,386
  • 5
  • 51
  • 74
  • first option is prohibitively expensive as proposing a noop forces a leader failover. that would disrupt a stable leader during multipaxos and force disk flushes with a noop. if it was just a case of packet loss at one node then you probably really don't want that node to propose itself as the leader as it may be about to get extended packet loss at that host causing complete instability of the cluster. if you had just had a leader election such that there was no lost values you would force a second leader election. it would be better to put some error checking into the stable flow. – simbo1905 Oct 23 '14 at 04:24
  • second option you do want an out-of-bound catchup method but if you had an inbound way of detecting gaps then you would know quickly to catch up. you want to catch up from someone who is up-to-date not someone who is missing values. that suggest both an in-band way to learn if you had gaps and a reliable way of finding who is up to date so that you can contact them out-of-band to catch up. catching up implies knowing the ordered list of operations you have missed. as per the question that is not clear who knows the list of missing values immediately after, say, multiple close failovers. – simbo1905 Oct 23 '14 at 04:36
  • @simbo1905 the first shouldn't cause a leader failover because there is clearly a leader elected for after the hole. – Michael Deardeuff Oct 24 '14 at 07:47
  • By the way, these are exactly what I use in my implementation, which has been used by many, many production systems. – Michael Deardeuff Oct 24 '14 at 08:13
  • Oh yes I see its a safe poll as the proposal is less than an accepted proposal so no leader election and the nack will send the value. Nodes only need to keep the highest uncommitted values so this suggests you hold a buffer of committed values just for this query purpose? Thanks – simbo1905 Oct 24 '14 at 11:42
  • re-reading the Paxos made simple paper your answer matches if you have nacks for failed rounds. If you don't have nacks then you need to propose a high values to the gaps to learn if there is a value. If no value the paper says them the fix them to noops as they were fails rounds. The out of band-learning can be in-band by followers asking for retransmission to the leader or any node. I have accepted you answer but as it is textbook although I am also keeping mine as an alternative approach. Thanks. – simbo1905 Nov 02 '14 at 17:04