0

I'm developing a shell script that loops through a series of Postgres database table names and dumps the table data. For example:

# dump data

psql -h $SRC_IP_ADDRESS -p 5432 -U postgres -c "BEGIN;" AWARE

do
 :
 pg_dump -U postgres -h $IP_ADDRESS -p 5432 -t $i -a --inserts MYDB >> \
 out.sql
done

psql -h $IP_ADDRESS -p 5432 -U postgres -c "COMMIT;" MYDB

I'm worried about concurrent access to the database, however. Since there is no database lock for Postgres, I tried to wrap a BEGIN and COMMIT around the loop (using psql, as shown above). This resulted in an error message from the psql command, saying that:

WARNING:  there is no transaction in progress

Is there any way to achieve this? If not, what are the alternatives?

Thanks!

littleK
  • 19,521
  • 30
  • 128
  • 188
  • I wrote about this topic a little while ago. The short version is "drive `psql` as a co-process with the `coproc` command, or use a scripting language with support for connecting to Pg. See http://stackoverflow.com/a/8305578/398670 . However, that won't help you if you're using pg_dump, it's only useful for psql alone. – Craig Ringer Aug 25 '12 at 02:00
  • BTW, there is a way to lock all other users (except superusers) out of the DB. You can `REVOKE` the `CONNECT` right on the database for everybody except yourself (though superusers can always connect) then boot everybody else off using `pg_terminate_backend(pid)` in a query against `pg_stat_activity`. Not trivial, I'll admit. In this case I think you're barking up entirely the wrong tree with this approach anyway, see ruakh's answer. – Craig Ringer Aug 25 '12 at 02:06

1 Answers1

2

Your script has two main problems. The first problem is practical: a transaction is part of a specific session, so your first psql command, which just starts a transaction and then exits, has no real effect: the transaction ends when the command completes, and later commands do not share it. The second problem is conceptual: changes made in transaction X aren't seen by transaction Y until transaction X is committed, but as soon as transaction X is committed, they're immediately seen by transaction Y, even if transaction Y is still in-progress. This means that, even if your script did successfully wrap the entire dump in a single transaction, this wouldn't make any difference, because your dump could still see inconsistent results from one query to the next. (That is: it's meaningless to wrap a series of SELECTs in a transaction. A transaction is only meaningful if it contains one or more DML statements, UPDATEs or INSERTs or DELETEs.)

However, since you don't really need your shell-script to loop over your list of tables; rather, you can just give pg_dump all the table-names at once, by passing multiple -t flags:

pg_dump -U postgres -h $IP_ADDRESS -p 5432 \
    -t table1 -t table2 -t table3 -a --inserts MYDB >> out.sql

and according to the documentation, pg_dump "makes consistent backups even if the database is being used concurrently", so you wouldn't need to worry about setting up a transaction even if that did help.

(By the way, the -t flag also supports a glob notation; for example, -t table* would match all tables whose names begin with table.)

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Hi ruakh, thanks for your advice. I am looping through the tables because I need the dump output to be in a specific order for when I restore it. While the pg_dump is only doing SELECTs, I do also have a loop executing queries to insert SELECTed, queried data into temp tables. This is also a concern for me. Finally, when I use psql to restore the dump file, is that performed as a safe transaction? – littleK Aug 24 '12 at 20:40
  • @littleK: Firstly -- unrelated to your comment -- I've fixed some mistakes in my original answer. Secondly -- when you say that you say that you "need the dump output to be in a specific order for when I restore it", what kind of order do you have in mind? Because I just played a bit, and although I don't see anything in the documentation about this, it *looks* like `pg_dump` makes a point of dumping tables in an order that's consistent with foreign-key constraints, and I can't imagine what other ordering you could need. Thirdly -- when you restore the data, you can certainly *[continued]* – ruakh Aug 24 '12 at 20:52
  • *[continued]* wrap that in a transaction, but I can't imagine why you would need to. I mean usually, if you're restoring from backup, the entire system is offline until everything is restored. – ruakh Aug 24 '12 at 20:53
  • By the way, according to the documentation, `--inserts` "will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases." If you're planning to restore the data using straight-up `psql`, `COPY` statements might be better. – ruakh Aug 24 '12 at 20:54
  • Thanks for your help. I will switch to using the COPY statements. When I had tried the dump without explicitly looping through each table, there were key constraint errors upon restore and the order in the dump seemed to be alphabetical. I will revisit that, though. When the documentation says that pg_dump makes consistent backups, even when performed concurrently, does that mean that it will dump any table data as it was before, say, a concurrent insert? I appreciate you sharing your knowledge with me, its helping a lot. – littleK Aug 24 '12 at 21:12
  • @littleK: Re: "When the documentation says that pg_dump makes consistent backups, even when performed concurrently, does that mean that it will dump any table data as it was before, say, a concurrent insert?": That's how I interpret that, yes. – ruakh Aug 24 '12 at 23:10
  • @littleK `pg_dump` takes a snapshot at the start of the dump, and doesn't see any changes (like concurent inserts) made after that. As for ordering, it should be ordering the tables correctly its self. If you have circular foreign key references in a `--data-only` dump you might need to dump with `--disable-triggers` or use a `-Fc` dump and run `pg_restore --disable-triggers`. For full schema+data dumps, and for dumps without circular fk's, this should "just work". Can you provide details and an example of the problem? – Craig Ringer Aug 25 '12 at 02:05
  • Craig, after reading more documentation, I believe that the dump may not be ordered correctly because I am not specifying ALL of the tables in the database, only some (as per my requirement to only do a daily transfer of certain data from one production database to another). This puts me in a tight spot, because I would rather only do one dump (as opposed to looping through many) in order to guarantee working within the single snapshot. Here's what the documentation says: continued... – littleK Aug 25 '12 at 14:46
  • ...continued: "When -t is specified, pg_dump makes no attempt to dump any other database objects that the selected table(s) might depend upon. Therefore, there is no guarantee that the results of a specific-table dump can be successfully restored by themselves into a clean database." – littleK Aug 25 '12 at 14:46
  • Thinking about this more, however, I shouldn't be concerned about the dump. You see, the purpose of my script is to transfer any NEW data from one database to another, on a daily basis. So, what I do is loop through desired tables (in a specific order) and execute queries to get only the newest data, based on a last modified date in the parent table. Using "INSERT INTO temp_table SELECT * FROM table...", the results of the queries go into temp tables. Then, I dump the data from the temp tables, which are safe from any concurrent access. What I need to worry about is executing the queries... – littleK Aug 25 '12 at 15:01
  • ...one by one on the actual database tables to generate the relevant data, where concurrent access is possible (although not likely since the script will be run off-hours - but still, better be safe than sorry). – littleK Aug 25 '12 at 15:03
  • Sorry for using comments as a bit of a sounding board. However, it dawned upon me that upon executing the queries, I could inject multiple queries within the psql command, wrapped in a transaction. For example: psql -h localhost -p 5432 -U postgres -c "BEGIN; $SQL1; $SQL2; $SQL3; COMMIT;" ...This seems to work in some tests that I performed. Can anyone see a problem with doing this? Thanks! – littleK Aug 26 '12 at 00:11
  • @littleK: Wrapping a series of queries in a transaction does *not* guarantee that they'll all see a single consistent view of the database. If any other transactions are committed while you're in the middle of running your queries, then some of your queries will reflect the state of the database *before* the changes in those transactions, and some will reflect the state *after* the changes in those transactions. – ruakh Aug 26 '12 at 02:28
  • @ruakh, I had gotten the impression from the 9.1 transaction documentation stating that "The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously." As I interpret it, my transaction will see a single snapshot, as will a concurrent transaction, but they could potentially both affect each other upon a commit? I guess I need to start looking more into table locking? But I think that I could run into a problem there, too, as I'm dealing with multiple tables in my single transaction. – littleK Aug 26 '12 at 13:45
  • @littleK: I'm not sure I understand your interpretation. The "my transaction will see a single snapshot" part is false, but the "they could potentially both affect each other upon a commit" part is correct. Wrapping queries in a transaction has *no* effect on what they see (unless there are also DML statements in the same transaction). – ruakh Aug 26 '12 at 15:09
  • Thanks ruakh, I'm hoping that the following will ensure no other concurrent access affecting my queries: psql -h localhost -p 5432 -U postgres -1 -c "BEGIN TRANSACTION; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; $SQL1; $SQL2; $SQL3; END TRANSACTION" mydb...What do you think? – littleK Aug 26 '12 at 20:52
  • @littleK: I think "woah, crazy-bones, I wasn't aware of that feature". But yes, by my reading of the documentation, that will do exactly what you need. Awesome, I learned something new today! Thank you! :-D – ruakh Aug 26 '12 at 21:34
  • I'm glad to hear it! Thanks again! – littleK Aug 27 '12 at 01:12