0

We are using 3PC (three-phase commitment) for a distributed transaction. There are 4 nodes, A, B, C, D, where A is the coordinator.

  1. A received OK from all others and sent the prepare-to-commit message to them.
  2. While C and D received this message and moved to prepared state, B crashes and doesn't received this message (thus remaining in the wait state).
  3. A timeouts on B and sends abort to all others, but only D receives the abort message, while C crashes before receiving the abort message.

Now the question is: What will C do after recovery? According to http://courses.cs.vt.edu/~cs5204/fall00/distributedDBMS/sreenu/3pc.html, C will commit upon recovery following the failure transition instead of aborting as D does. Won't that result in an inconsistent state? Or C has some mechanism to detect that the transaction in an aborted state?

cntswj
  • 335
  • 3
  • 10

1 Answers1

0

I think there's a wrong assumption in your question about the behaviour of the B node? If B crashes before it moves to prepared state then it resides in waiting for state after the restart and will be aborted.

I expect that C node will be aborted as it will be commanded by the coordinator to do so. I think this will be similar to 2PC. It's up to coordinator to periodically check if the lost nodes are available again. When C is restarted the coordinator can see it and push the node to be rolled-back as abort message will be resent.

chalda
  • 702
  • 4
  • 18
  • Sorry there is a typo. I've changed B into C in the last paragraph. Not sure if the coordinator will keep checking C and resend message after it recovers. – cntswj Jul 20 '17 at 18:42
  • I see your point and I was not sure how to put it differently. Now I found an article https://www.researchgate.net/publication/275154978_Three-Phase_Commit. I would cite from there: "Notice that a recovering participant cannot commit a transaction even if the participant is in a pre-commit state with respect to the transaction. This is because the operation-all sites might have decided to abort the transaction after the participant had failed if none of them was in a pre-commit state. In this case, the participant must ask the other sites about the final status of the transaction." – chalda Aug 03 '17 at 09:20
  • This point fact is that C is recovering. The C can find the state by waiting for the coordinator to be commanded. Or it can ask the other participant to understand the final state. This is up to the implementation. – chalda Aug 03 '17 at 09:21
  • Seems that 3PC does not support all kinds of multipoint failures... as discussed in https://stackoverflow.com/questions/21424962/how-does-three-phase-commit-avoid-blocking. – cntswj Sep 07 '17 at 19:52
  • From my point of view you are right. The 3PC itself (as a protocol) is not capable to resolve that situation. But the implementation of 3PC should count with the state and being able to finish the transaction correctly. – chalda Sep 08 '17 at 10:10