Using two phase commits on Postgres

Question

Assuming that a have a table called "t1" in a "db1" and other table called "t2" in a "db2", and I need to insert a record on both tables or fails.

Connected to db1, I guess I shall type this:

BEGIN;
PREPARE TRANSACTION 'pepe'; -- this says the manual that makes your transaction gets stored on disk, so what is the purpose if I can't use it from another database?)

insert into t1 (field) values ('a_value');

COMMIT PREPARED 'pepe'

Connected to db2, I guess that:

BEGIN;
PREPARE TRANSACTION 'pepe'; -- this fails (the name of the transaction, what is the meaning, what is use for?)
 -- It complains about this "ERROR:  transaction identifier "pepe" is already in use"

insert into t2 (field) values ('another_value');

COMMIT PREPARED 'pepe'

As you may see I don't understand how to use two phase commits on Postgres.

Summary

I'm not getting how to perform synchronization commands on different DB within the same RDBMS.

I have read in the Postgres documentation that for synchronizing works across two or more unrelated Postgres databases an implementation of the so called "two-phases commits" protocol is at our disposal.

So I start trying to see how people do actually use them within the Postgres, I do not see any actual example, at most I get to this post of a guy that was trying to experiment with several Postgres client connected to the different databases in order to emulate the multiple process running in paralell doing things to the several dbs that should end in a successful (all commit) or failure result (all rollback).

Other sources I have seen examples were:

https://en.wikipedia.org/wiki/Two-phase_commit_protocol (this source explain well the protocol but really makes me wonder where or who is my "Coordinator" and how to send messages to the "participants"... i only got prepare transaction <id>, commit prepared <id> or rollback prepared <id> commands at my disposal)
Two phase commit
https://dba.stackexchange.com/questions/145656/dependent-transaction-in-separate-database-connections
https://www.endpointdev.com/blog/2010/07/distributed-transactions-and-two-phase/
https://www.citusdata.com/blog/2017/11/22/how-citus-executes-distributed-transactions/
(From a golang client-app) https://github.com/go-pg/pg/issues/490

Resolution (After Laurenz's Answer)

Connected to db1, these are the SQL lines to execute:

BEGIN;
-- DO THINGS TO BE DONE IN A ALL OR NOTHING FASHION
-- Stop point --    
PREPARE TRANSACTION 't1';
COMMIT PREPARED 't1' || ROLLBACK PREPARED 't1' (decision requires awareness and coordination)

Meanwhile connected to the db2 these will be the script to execute:

BEGIN;
-- DO THINGS TO BE DONE IN A ALL OR NOTHING FASHION
-- Stop point --  
PREPARE TRANSACTION 't2';

COMMIT PREPARED 't2' || ROLLBACK PREPARED 't2'

The -- Stop point -- is where a coordinator process (for example an application executing the statement, or a human behind a psql client console or pgAdminII) shall stop the execution of both scripts (actually not execute any further instruction, that is what I mean by stop).
Then, first on db1 (and then on db2, or vice versa) the coordinator process (whatever been human or not) must run PREPARE TRANSACTION on each connection.
- If one of then fails, then the coordinator must run ROLLBACK PREPARED on those database where the transaction was already prepared and ROLLBACK on the others.
- If no one fails the coordinator must run COMMIT PREPARED on all involved databases, an operation that shall not fail ever (like existing the home when you are one step outside your house with all the things properly set to exit safely)

guess this can help me... https://stackoverflow.com/questions/8953423/postgresql-prepare-transaction?rq=1, i spend a lot of hours looking by two-phase commits instead of prepare-transaction... :S — Victor, Nov 26 '19 at 18:57
the above link has an example that is odd i guess, based on Laurenz answer it's not preparing two transactions, and is it using two databases... — Victor, Nov 26 '19 at 20:30
Note that one of the articles you mention has moved here: https://www.endpointdev.com/blog/2010/07/distributed-transactions-and-two-phase/ — Jon Jensen, Dec 05 '21 at 01:40

score 13 · Accepted Answer · answered Nov 26 '19 at 19:33

13

I think you misunderstood PREPARE TRANSACTION.

That statement ends work on the transaction, that is, it should be issued after all the work is done. The idea is that PREPARE TRANSACTION does everything that could potentially fail during a commit except for the commit itself. That is to guarantee that a subsequent COMMIT PREPARED cannot fail.

The idea is that processing is as follows:

Run START TRANSACTION on all database involved in the distributed transaction.
Do all the work. If there are errors, ROLLBACK all transactions.
Run PREPARE TRANSACTION on all databases. If that fails anywhere, run ROLLBACK PREPARED on those database where the transaction was already prepared and ROLLBACK on the others.
Once PREPARE TRANSACTION has succeeded everywhere, run COMMIT PREPARED on all involved databases.

That way, you can guarantee “all or nothing” across several databases.

One important component here that I haven't mentioned is the distributed transaction manager. It is a piece of software that persistently memorizes where in the above algorithm processing currently is so that it can clean up or continue committing after a crash.

Without a distributed transaction manager, two-phase commit is not worth a lot, and it is actually dangerous: if transactions get stuck in the “prepared” phase but are not committed yet, they will continue to hold locks and (in the case of PostgreSQL) block autovacuum work even through server restarts, as such transactions must needs be persistent.

This is difficult to get right.

answered Nov 26 '19 at 19:33

Laurenz Albe

209,280
17
206
263

"I think you misunderstood PREPARE TRANSACTION" <-- totally true. Thanks for the answers Laurenz, i guess it helps me a lot. Talking about a transaction manager would only add layers of information that make the entire compression gets harder, so for a start lets stay focus on the simple scenario where there are no external failures have to be controlled by an overseer process. I will modify my example, following your explanation, and if you don't mind, you may validate it please. THanks again, – Victor Nov 26 '19 at 19:57
If you have no transaction manager, better not use distributed transactions. – Laurenz Albe Nov 26 '19 at 19:59
Ok i got the idea, and i guess that the very application i use to connect to the databases could be the overseer process so called transaction manager, no? but, again.. let's go in baby steps here! We have an app programmed in Golang, we make our things agains the databases using that app. – Victor Nov 26 '19 at 20:00
Sure it can be. But then it had better make sure to persist the information so that it can resume working after a crash. Otherwise you end up with "stuck" prepared transactions. – Laurenz Albe Nov 26 '19 at 20:01
Great Laurenz, and thanks again for all those tips and information, you really are helping me a lot!. I will get to the "transaction manager" thing.. but first i need how to write properly well the very very sql lines, i am modifying my example above, in order to display results! – Victor Nov 26 '19 at 20:03
Laurenz i have updated my question adding an example of two-phase commits usages. Can you validate it, please? If it's correct, then we can talk about where to get one of those "transaction manager" or how to implement they at a client-application level (client of the RDBMS i mean). – Victor Nov 26 '19 at 20:22
I think that would exceed the scope of a Stackoverflow answer. – Laurenz Albe Nov 27 '19 at 07:00
i DO think that "transaction-manager" affairs can be addressed on a separated (but related) question (e.g. "how to implement a transaction manager for two-phase commits in postgres", for example)... but for now, can't you tell me if i understood? (i have updated my question adding important information) – Victor Nov 27 '19 at 11:48
1

Yes, looks like you got it right. Be warned that "How to write a distributed transaction manager?" is a question that is too broad. Try to get literature on the dubject. – Laurenz Albe Nov 27 '19 at 12:06
Thanks again!! i'm taking note of that: "The two-phase commits are a great tool that comes full of perils and dangers", with a big label screaming "in production environments do not use without transaction manager". – Victor Nov 27 '19 at 12:22

Using two phase commits on Postgres

Summary

Resolution (After Laurenz's Answer)

1 Answers1

Linked