78

I believe most of people know what 2PC (two-phase commit protocol) is and how to use it in Java or most of modern languages. Basically, it is used to make sure the transactions are in sync when you have 2 or more DBs.

Assume I've two DBs (A and B) using 2PC in two different locations. Before A and B are ready to commit a transaction, both DBs will report back to the transaction manager saying they are ready to commit. So, when the transaction manager is acknowledged, it will send a signal back to A and B telling them to go ahead.

Here is my question: let's say A received the signal and commited the transaction. Once everything is completed, B is about to do the same but someone unplugs the power cable, causing the whole server shutdown. When B is back online, what will B do? And how does B do it?

Remember, A is committed but B is not, and we are using 2PC (so, the design of 2PC stops working, does not it?)

Mikel Urkia
  • 2,087
  • 1
  • 23
  • 40
dongshengcn
  • 6,434
  • 7
  • 35
  • 44

3 Answers3

87

On Two-Phase Commit

Two phase commit does not guarantee that a distributed transaction can't fail, but it does guarantee that it can't fail silently without the TM being aware of it.

In order for B to report the transaction as being ready to commit, B must have the transaction in persistent storage (i.e. B must be able to guarantee that the transaction can commit in all circumstances). In this situation, B has persisted the transaction but the transaction manager has not yet received a message from B confirming that B has completed the commit.

The transaction manager will poll B again when B comes back online and ask it to commit the transaction. If B has already committed the transaction it will report the transaction as committed. If B has not yet committed the transaction it will then commit as it has already persisted it and is thus still in a position to commit the transaction.

In order for B to fail in this situation, it would have to undergo a catastrophic failure that lost data or log entries. The transaction manager would still be aware that B had not reported a successful commit.1

In practice, if B can no longer commit the transaction, it would imply that the disaster that took B out had caused data loss, and B would report an error when the TM asked it to commit a TxID that it wasn't aware of or didn't think was in a commitable state.

Thus, two phase commit does not prevent a catastrophic failure from occuring, but it does prevent the failure from going unnoticed. In this scenario the transaction manager will report an error back to the application if B cannot commit.

The application still has to be able to recover from the error, but the transaction cannot fail silently without the application being made aware of the inconsistent state.

Semantics

  • If a resource manager or network goes down in phase 1, the transaction manager will detect a fatal error (can't connect to resource manager) and mark the sub-transaction as failed. When the network comes back up it will abort the transaction on all of the participating resource managers.

  • If a resource manager or network goes down in phase 2, the transaction manager will continue to poll the resource manager until it comes back up. When it re-connects back to the resource manager it will tell the RM to commit the transaction. If the RM returns an error along the lines of 'Unknown TxID' the TM will be aware that there is a data loss issue in the RM.

  • If the TM goes down in phase 1 then the client will block until the TM comes back up, unless it times out or receives an error due to the broken network connection. In this case the client is made aware of the error and can either re-try or initiate the abort itself.

  • If the TM goes down in phase 2 then it will block the client until the TM comes back up. It has already reported the transaction as committable and no fatal error should be presented to the client, although it may block until the TM comes back up. The TM will still have the transaction in an uncommitted state and will poll the RMs to commit when it comes back up.

Post-commit data loss events in the resource managers are not handled by the transaction manager and are a function of the resilience of the RMs.

Two-phase commit does not guarantee fault tolerance - see Paxos for an example of a protocol that does address fault tolerance - but it does guarantee that partial failure of a distributed transaction cannot go un-noticed.

  1. Note that this sort of failure could also lose data from previously committed transactions. Two phase commit does not guarantee that the resource managers can't lose or corrupt data or that DR procedures don't screw up.
ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
  • This answer makes a dangerous assumption that [The Network is Reliable](https://queue.acm.org/detail.cfm?id=2655736). – Filip Haglund Apr 02 '16 at 20:48
  • I'm not sure what you're getting at. In the event of a network outage in the first phase the TM would fail to connect to one of the RMs and abort the transaction due to the fatal error, ultimately telling all the RMs to abort as they came back online. In the second phase it would just continue to poll the RM until the network came back up, at which point it would issue the confirm-commit. Again, 2PC doesn't guarantee commit, it just guarantees that the transaction can't fail without the application being made aware of it. – ConcernedOfTunbridgeWells Apr 03 '16 at 10:13
  • What if the TM crashes after receiving all votes to commit, but before sending the final commit/abort? Then everyone's holding locks, waiting for a vote. And you cannot timeout, some nodes might commit while others timeout and abort. If you have a synchronous reliable network, and no node ever crashes, and you have real-time operating systems so you can guarantee progress within a bounded time, then 2PC works. Usually, we have none of those properties. – Filip Haglund Apr 03 '16 at 11:48
  • 2
    Don't confuse 2PC semantics with fault tolerance. The TM will still have the transaction marked as uncommitted until it has received confirmation of all commits in the second phase. When it comes back up it will poll all the resource managers and ask them to commit and they will report the transaction as committed. Remember, the RMs must be able to guarantee commit. 2PC doesn't do anything about a catastrophic failure of a RM after it's reported a successful commit on the second phase - or at any time after that. – ConcernedOfTunbridgeWells Apr 03 '16 at 12:22
  • Here's a paper on a 3-phase commit protocol called Paxos, which does address fault tolerance. http://research.microsoft.com/pubs/64636/tr-2003-96.pdf – ConcernedOfTunbridgeWells Apr 03 '16 at 12:35
  • There's also Raft. Paxos is apparently one of the hardest things to implement correctly :) – Filip Haglund Apr 03 '16 at 12:38
  • Suppose this scenario: 1. first we have a=0, b=0; 2. a transaction want's to set a=1 and set b=2; 3. with 2PC, both a and b are in prepared state, then set commit point as committed state, then update a as committed state, mean while, b is still at prepared state. 4. now we read a, found that a=1, after that, read b, found that b=0. The question is, isn't this behavior violates atomicity? I would expect b=2 after confirmed a=1. Otherwise, **Atomicity** is actually **Eventually Atomicity**. – ideawu Mar 14 '21 at 03:14
  • What if we read A **then** B, when A is committed while B is uncommitted? What's the result? Are we going to see A with new value, and B with old value? – ideawu Mar 14 '21 at 03:19
  • If you updated a after committing it, the update would take place in another transaction. You would get two updates to a. The locks held by the original transaction would prevent the update to a until it committed, so the second transaction would block until the first one had completed. If you read b before the first transaction had committed you could see the old view of b, new view of b, or the read would block until first transaction had committed, depending on the isolation level of the transactions and whether the database supported MVCC. – ConcernedOfTunbridgeWells Mar 14 '21 at 14:20
  • 2PC doesn't help when there's data loss because 2PC assumes that nodes never fail permanently and never lose data. I.e., it's not designed to help in those cases. – Blueriver Feb 24 '23 at 22:39
  • @Blueriver - 2PC is not fault tolerance. You can still lose data after the transaction commits, and you can still have a catastrophic failure in the middle of the transaction. What it does guarantee is that the transaction can't fail silently without the application being made aware of it. – ConcernedOfTunbridgeWells Feb 25 '23 at 22:03
4

I believe three phase commit is a much better approach. Unfortunately I haven't found anyone implementing such a technology.

http://the-paper-trail.org/blog/consensus-protocols-three-phase-commit/

Here are the essential parts of the above article :

The fundamental difficulty with 2PC is that, once the decision to commit has been made by the co-ordinator and communicated to some replicas, the replicas go right ahead and act upon the commit statement without checking to see if every other replica got the message. Then, if a replica that committed crashes along with the co-ordinator, the system has no way of telling what the result of the transaction was (since only the co-ordinator and the replica that got the message know for sure). Since the transaction might already have been committed at the crashed replica, the protocol cannot pessimistically abort – as the transaction might have had side-effects that are impossible to undo. Similarly, the protocol cannot optimistically force the transaction to commit, as the original vote might have been to abort.

This problem is – mostly – circumvented by the addition of an extra phase to 2PC, unsurprisingly giving us a three-phase commit protocol. The idea is very simple. We break the second phase of 2PC – ‘commit’ – into two sub-phases. The first is the ‘prepare to commit’ phase. The co-ordinator sends this message to all replicas when it has received unanimous ‘yes’ votes in the first phase. On receipt of this messages, replicas get into a state where they are able to commit the transaction – by taking necessary locks and so forth – but crucially do not do any work that they cannot later undo. They then reply to the co-ordinator telling it that the ‘prepare to commit’ message was received.

The purpose of this phase is to communicate the result of the vote to every replica so that the state of the protocol can be recovered no matter which replica dies.

The last phase of the protocol does almost exactly the same thing as the original ‘commit or abort’ phase in 2PC. If the co-ordinator receives confirmation of the delivery of the ‘prepare to commit’ message from all replicas, it is then safe to go ahead with committing the transaction. However, if delivery is not confirmed, the co-ordinator cannot guarantee that the protocol state will be recovered should it crash (if you are tolerating a fixed number f of failures, the co-ordinator can go ahead once it has received f+1 confirmations). In this case, the co-ordinator will abort the transaction.

If the co-ordinator should crash at any point, a recovery node can take over the transaction and query the state from any remaining replicas. If a replica that has committed the transaction has crashed, we know that every other replica has received a ‘prepare to commit’ message (otherwise the co-ordinator wouldn’t have moved to the commit phase), and therefore the recovery node will be able to determine that the transaction was able to be committed, and safely shepherd the protocol to its conclusion. If any replica reports to the recovery node that it has not received ‘prepare to commit’, the recovery node will know that the transaction has not been committed at any replica, and will therefore be able either to pessimistically abort or re-run the protocol from the beginning.

So does 3PC fix all our problems? Not quite, but it comes close. In the case of a network partition, the wheels rather come off – imagine that all the replicas that received ‘prepare to commit’ are on one side of the partition, and those that did not are on the other. Then both partitions will continue with recovery nodes that respectively commit or abort the transaction, and when the network merges the system will have an inconsistent state. So 3PC has potentially unsafe runs, as does 2PC, but will always make progress and therefore satisfies its liveness properties. The fact that 3PC will not block on single node failures makes it much more appealing for services where high availability is more important than low latencies.

3

Your scenario is not the only one where things can ultimately go wrong despite all effort. Suppose A and B have both reported "ready to commit" to TM, and then someone unplugs the line between TM and, say, B. B is waiting for the go-ahead (or no-go) from TM, but it certainly won't keep waiting forever until TM reconnects (its own resources involved in the transaction must stay locked/inaccessible throughout the entire wait time for obvious reasons). So when B is kept waiting too long for its own taste, it will take what is called "heuristic decisions". That is, it will decide to commit or rollback independently from TM, based on, well, I don't really know what, but that doesn't really matter. It should be obvious that any such heuristic decisions can deviate from the actual commit decision taken by TM.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
  • But the wiki says "After a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received". I believe that means B will block after it sends the agreement message, even if the message never reaches TM or the commit/rollback message gets lost on the way back which B simply don't know nor care. – du369 Aug 01 '18 at 03:47
  • But B won't block forever. Because under the assumption of "that no node crashes forever" and "that any two nodes can communicate with each other", B will eventually get the commit/rollback message thus complete the whole process. https://en.wikipedia.org/wiki/Two-phase_commit_protocol – du369 Aug 01 '18 at 03:49
  • Another response got the comment that said response "makes the ***dangerous assumption*** that the network is reliable". – Erwin Smout Aug 01 '18 at 06:34
  • That is a dangerous assumption, yes. But I think the two assumptions in the wiki page do not assume the network to be reliable. It just says that any two node "can" communicate with each other, which means, to my understanding, that not all messages are lost on the network. Then with some timeout and retrying mechanism, the agreement message and the commit/rollback message will eventually reach their destination and the algorithm may continue. – du369 Aug 01 '18 at 08:33
  • If B gets bored and decides to abort the transaction, then the TM will ask it to commit next time it comes back online and B will report an error for the TxID. The transaction will still fail, but it won't fail silently; the TM will still be made aware of the aborted transaction on B. 2PC is not about fault tolerance, it's about making sure that distributed transactions commit atomically and errors in a distributed transaction are picked up when they do occur. – ConcernedOfTunbridgeWells Oct 22 '20 at 17:25