1

I'm having difficulty to understand when does a leader actually ACKs client. Here is part of a DistributedLog documentation:

enter image description here

Each batched entry appended to a log segment will be assigned a monotonically increasing entry id by the log segment writer. All the entries are written asynchronously in a pipeline. The log segment writer therefore updates an in-memory pointer, called LAP (LastAddPushed), which is the entry id of the last batched entry pushed to log segment store by the writer. The entries could be written out of order but only be acknowledged in entry id order. Along with the successful acknowledges, the log segment writer also updates an in-memory pointer, called LAC (LastAddConfirmed). LAC is the entry id of the last entry that already acknowledged by the writer. All the entries written between LAC and LAP are unacknowledged data, which they are not visible to readers.

The readers can read entries up to LAC as those entries are known to be durably replicated - thereby can be safely read without the risk of violating read ordering. The writer includes the current LAC in each entry that it sends to BookKeeper. Therefore each subsequent entry makes the records in the previous entry visible to the readers. LAC updates can be piggybacked on the next entry that are written by the writer. Since readers are strictly followers, they can leverage LAC to read durable data from any of the replicas without need for any communication or coordination with the writer.

DL introduces one type of system record, which is called control record - it acts as the commit request in two-phases-commit algorithm. If no application records arrive within the specified SLA, the writer will generate a control record. With writing the control record, it would advance the LAC of the log stream. The control record is added either immediately after receiving acknowledges from writing a user record or periodically if no application records are added. It is configured as part of writer's flushing policy. While control log records are present in the physical log stream, they are not delivered by the log readers to the application.

Now consider the following scenario:

  1. Leader publishes message to Bookkeeper
  2. Followers get the messages, append to log and send ACK to leader
  3. Leader gets the confirmation from followers, increments LAC and replies client that messages is committed.
  4. NOW: Leader fails before it could piggyback to followers that LAC has been incremented.
  5. The question is: Since potential leader is not aware of the fact that LAC has been incremented, it becomes the new leader and truncates the log to old LAC, which means we have lost an entry in the log that has been confirmed by previous leader.

As a result, client has been confirmed that the message has been successfully written, but it has been lost.

Eugen Constantin Dinca
  • 8,994
  • 2
  • 34
  • 51
Majid Azimi
  • 5,575
  • 13
  • 64
  • 113

1 Answers1

1

Since potential leader is not aware of the fact that LAC has been incremented, it becomes the new leader and truncates the log to old LAC, which means we have lost an entry in the log that has been confirmed by previous leader.

There are several cases:

1) if the leader gracefully close the log, it will seal the log segment that it is writing to. The LAC will be advanced and it will also be recorded as part of the log segment metadata (which is stored in the metadata store).

2) if the leader crashes and doesn't close the log gracefully, a potential leader comes up, it will go through a recovery process. What the new leader does will be:

  • a) It will attempt the seal the last log segment that written by previous leader. The seal process is done by bookkeeper client, which includes two parts: (a) it will fence the log segment. fencing enforces that no more writes can happen in this log segment. (b) it will then do a forward-recovery from the last known LAC and recover entries that were written but not committed yet.

  • b) After recover the last log segment, the new leader will open a new log segment to write entries.

Hope this explains your question.

DistributedLog also has a paper published in ICDE 2017. You can get it from here.

Sijie Guo
  • 201
  • 1
  • 2