PostgreSQL: improving pg_dump, pg_restore performance

Question

When I began, I used pg_dump with the default plain format. I was unenlightened.

Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.

When it came time to create the database anew,

# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;

% gunzip dumpfile.gz              # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile   # into a new, empty database defined above

I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:

# select pg_size_pretty(pg_database_size('dbname'));
47 GB

Because there are predictions this database will be a few terabytes, I need to look at improving performance now.

Please, enlighten me.

score 66 · Accepted Answer · answered Jan 19 '10 at 17:01

66

First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffers should be set correctly, maintenance_work_mem should be increased during the restore, full_page_writes should be off during the restore, wal_buffers should be increased to 16MB during the restore, checkpoint_segments should be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuum should be disabled during the restore.

If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.

answered Jan 19 '10 at 17:01

Ants Aasma

53,288
15
90
97

If you have a slave connected, and the load on the master is already considerable, then you may want to just do the backup on the slave instead. Especially since the slave is read-only, I imagine that may also help to some degree. In a large cluster, it may help to have one or more slaves dedicated to staggered backup's if the backups take a long time. So that you don't miss anything, you would want these standby's connected via streaming replication so they get written to from the WAL on the master. – Michael M Nov 07 '13 at 22:36
16

`shared_buffers should be set correctly` what that mean? – Juan Carlos Oropeza Dec 04 '16 at 00:22
1

@JuanCarlosOropeza — I came across the following document about [`shared_buffers`](https://www.postgresqlco.nf/en/doc/param/shared_buffers/) that might be helpful. – Darragh Enright May 11 '20 at 16:11

score 47 · Answer 2 · edited Apr 19 '22 at 09:35

47

Improve pg dump&restore

PG_DUMP | always use format-directory and -j options

time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external

PG_RESTORE | always use tuning for postgres.conf and format-directory and -j options

work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1

time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/

edited Apr 19 '22 at 09:35

JC Boggio

367
1
11

answered Dec 30 '16 at 21:08

Yanar Assaf

1,011
9
4

1

configuration parameters used here improved performance significantly – ramnar Dec 06 '18 at 02:26
1

Wow! This helped me so much! Thank you! – stasdeep Apr 06 '20 at 08:05

score 14 · Answer 3 · answered Jan 19 '10 at 16:47

14

Two issues/ideas:

By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.

answered Jan 19 '10 at 16:47

Matthew Wood

16,017
5
46
35

We are currently at 8.3; new reason to upgrade. – Joe Creighton Jan 19 '10 at 19:32
You can use the 8.4 version of pg_restore with an 8.3 version of the server. Just make sure you use pg_dump from 8.3. – Magnus Hagander Jan 19 '10 at 20:17
Bah. We are stuck at 8.3 because we use the Solaris10 package install of Postgres and, "there is no plan to integrate PG8.4 into S10 at this moment." [Ref. http://www.mail-archive.com/pgsql-general@postgresql.org/msg136829.html] I would have to take on the task of installing and maintaining the open-source postgres. Unsure if we can do that here... Feh. – Joe Creighton Jan 19 '10 at 21:18

score 11 · Answer 4 · edited May 05 '23 at 12:36

11

I assume you need backup, not a major upgrade of database.

For backup of large databases you should setup continuous archiving instead of pg_dump.

Set up WAL archiving.

Make your base backups for example every day by using

psql template1 -c "select pg_start_backup('`\`date +%F-%T\``')"
rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/
psql template1 -c "select pg_stop_backup()"

A restore would be as simple as restoring database and WAL logs not older than pg_start_backup time from backup location and starting Postgres. And it will be much faster.

edited May 05 '23 at 12:36

Community

1
1

answered Jan 19 '10 at 18:15

Tometzky

22,573
5
59
73

2

We didn't look at PITR (WAL archiving) because the system is not very transaction heavy but will retain many historical records instead. However, now that I think about it, a more "incremental" backup may help matters. I shall investigate. Thanks. – Joe Creighton Jan 19 '10 at 19:28

score 7 · Answer 5 · answered Jan 22 '10 at 02:34

7

zcat dumpfile.gz | pg_restore -d db_name

Removes the full write of the uncompressed data to disk, which is currently your bottleneck.

answered Jan 22 '10 at 02:34

richo

8,717
3
29
47

score 3 · Answer 6 · answered Jan 19 '10 at 16:31

3

As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.

So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.

answered Jan 19 '10 at 16:31

Will Hartung

115,893
19
128
203

If optimizing just the pg_dump time, with parallel dump as of v9.3, compression >0 can hurt a lot! This is because pg_dump and postmaster processes already hog the CPU enough that the addition of compression >=1 makes the overall task significantly CPU-bound instead of I/O-bound. Basically, the older assumption that the CPUs are idle without compression is invalid with parallel dump. – Asclepius Nov 01 '14 at 08:48

score 1 · Answer 7 · answered Aug 18 '22 at 14:57

If you're facing issues with the speed of pg_restore check whether you dumped your data using INSERT or COPY statement.

If you use INSERT (pg_dump is called with --column-inserts parameter) the restore of data would be significantly slower.

Using INSERT is good for making dumps that are loaded into non-Postgres databases. But if you do a restore to Postgres omit using --column-inserts parameter when using pg_dump.

PostgreSQL: improving pg_dump, pg_restore performance

7 Answers7

Improve pg dump&restore

Linked