4

Let's say I made a quick backup dump of my Postgres 9.3 DB through pg_dump before doing a large destructive migration and I discovered I want to undo it. No writes were performed against the DB in the meantime.

Say I run pg_restore -c -d mydb < foo.dump to load the dump back into the db. Assuming I have WAL-E set up to archive every 16mb of WAL, do I need to turn off archive_mode before performing the restore? It would not be super useful for me to archive the xlog as I'm writing the dump back into the DB, since I already have perfectly valid base backups and WAL segments archived for before the dump. Also there are serious consequences to performing xlog shipping as I'm restoring the dump, which get worse with the size of the dump.

Do you end up disabling archiving before a restore? Do you do anything else to speed things up? There's a discussion of restore performance in this post, but it doesn't cover archiving at all, unless I missed something.

Community
  • 1
  • 1
Alexandr Kurilin
  • 7,685
  • 6
  • 48
  • 76

1 Answers1

8

You can't really turn WAL archiving on and off like that. WAL replay requires continuity.

If you turned WAL archiving off, then made changes, then turned it back on, the new WAL generated after turning it off and on again would be useless. They could not be applied to the DB, and you'll have to make a new base backup before you can resume WAL replay / PITR.

If you turn xlog shipping off during a restore, you'll want to purge your old base backup and WAL archives, then create a new base backup before resuming WAL shipping.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Yes, I should have clarified that I'd have had to make a new base backup right afterwards and lost continuity with the previous chunk of history. My question still stands though: what's the best practice for reading with restores in this kind of situation? – Alexandr Kurilin Jun 03 '14 at 05:18
  • 1
    If there's only one DB you care about in the cluster, you can speed up restores massively by stopping Pg, setting `fsync=off`, starting Pg, doing the restore, stopping Pg, setting `fsync=on` and starting Pg back up. However, if Pg crashes with `fsync=off` you may have severe or total data loss. You can also turn off archiving and set `wal_level = minimal` for the period. There are existing questions on dba.stackexchange.com about speeding up Pg restores... – Craig Ringer Jun 03 '14 at 05:23
  • Is there really any way I can have severe data loss if I'm actually restoring from a dump through pg_restore? I still have the dump on the disk, right? I can just try again, no? – Alexandr Kurilin Jun 03 '14 at 05:26
  • 2
    If that's the only DB in your PostgreSQL install, sure. But if you have other DBs in the same install that have other data of value, that might be a problem as the write-ahead log, transaction log, etc is global across all databases. – Craig Ringer Jun 03 '14 at 07:21
  • @CraigRinger What, really? As-in, every database on the cluster ('instance' in SQL Server world, i.e. all the databases served by a single PostgreSQL program) shares the same WAL and transaction logs? – Kenny Evitt Oct 18 '19 at 02:52
  • 1
    @KennyEvitt Correct. There's no way to separate them. It's one of the numerous reasons why I tend to recommend that people stick to one PostgreSQL database per instance for scalable deployments. It's a significant hassle. If the WAL was split out, we'd have issues with flush ordering, persistence of transaction commit logs etc vs data logs, and more. But really it still needs to be done... – Craig Ringer Oct 18 '19 at 03:43
  • @CraigRinger "I tend to recommend that people stick to one PostgreSQL database per instance for scalable deployments." – that's really good to know! Instead of splitting the WAL out by database, they could double down on the per-cluster focus and allow cross-database queries and referential integrity *in the same cluster*. – Kenny Evitt Oct 18 '19 at 16:32