Suppose all the databases involved in a distributed transaction implemented with two-phase commit signal that they are ready to commit and have the necessary locks. The coordinator signals to commit and all databases execute their portion of the transaction, but one SQL database encounters a divide-by-zero error as a result of a programming oversight that fails to consider that possibility. Since the coordinator already signaled commit to everyone what happens as a result of that divide-by-zero?

- 168,620
- 35
- 240
- 369

- 10,677
- 21
- 72
- 135
-
How and when exactly do you expect this error to happen? I assume that such an error would occur during phase one, causing a rollback. – Oded Jun 23 '12 at 20:09
-
Do you mean that the definition of the precommit phase is that everyone actually fully executes their portion of the transaction and that commit phase is defined by simply writing "
" to a log but the critical point is that no actualy execution of the transaction occurs during the commit phase? All the articles on two-phase commit I've encountered never exactly clearly state when each database executes their portion of the transaction – user782220 Jun 23 '12 at 20:15 -
1Well, what _actually_ happens is implementation specific. But yes, that would pretty much be what happens (changes are made and the only thing that the distributed databases are waiting for is the ack from the coordinator in order to "close the deal" by committing). – Oded Jun 23 '12 at 20:17
-
Is there some article that explicitly says that execution occurs during precommit phase? – user782220 Jun 23 '12 at 20:28
-
When the participant signals the coordinator that it is ready to commit, it is promising that it will wait for the coordinator to tell it the decision and that it will be able to complete the transaction if the decision is commit (and rollback if the decision is rollback). Now, if the coordinator sends COMMIT and the participant crashes while committing because of a bug, you get an incomplete commit. I've forgotten if the participants acknowledge when they've committed; a crashed participant that restarts can reinterrogate the coordinator to find the status of the transaction. – Jonathan Leffler Jun 23 '12 at 21:12
1 Answers
The second commit phase normally does not contain user code that can fail. The participating resource managers need to guarantee that no failure can occur. If this guarantee is violated no guarantees can be provided by the protocol.
Two phase commit tries to solve the Two Generals Problem. There is no full solution to this problem. TPC is an approximation.
Another way TPC can fail is in case of a network partition. Some resource managers might perform the final commit but some might not receive that message. Again, this problem is unsolvable. Even retries cannot solve it.
You can even trigger this problem under real world conditions: Run all participating nodes in a stress test and pull the network cable at an arbitrary point. With high probability your distributed databases are now inconsistent because some commit messages got lost an a very inconvenient time.

- 168,620
- 35
- 240
- 369