5

I just had to use pg_restore with a small dump of 30MB and it took in average 5 minutes! On my colleagues' computers, it is ultra fast, like a dozen of seconds. The difference between the two is the CPU usage: while for the others, the database uses quite a bunch of CPU (60-70%) during the restore operation, on my machine, it stays around a few percents only (0-3%) as if it was not active at all.

The exact command was : pg_restore -h 127.0.0.1 --username XXX --dbname test --no-comments test_dump.sql

The originating command to produce this dump was: pg_dump --dbname=XXX --user=XXX --no-owner --no-privileges --verbose --format=custom --file=/sql/test_dump.sql

Look at the screenshot taken in the middle of the restore operation:

Screenshot during the restore

Here is the corresponding vmstat 1 result running the command:

Screenshot of vmstat

I've looked at the web for a solution during a few hours but this under-usage of the CPU remains quite mysterious. Any idea will be appreciated.

For the stack, I am on Ubuntu 20.04 and postgres version 13.6 is running into a docker container. I have a decent hardware, neither bad nor great.

EDIT: This very same command worked in the past on my machine with a same common HDD but now it is terribly slow. The only difference I saw with others (for whom it is blazing fast) was really on the CPU-usage from my point of view (even if they have an SSD which shouldn't be at all the limiting factor especially with a 30 MB dump).

EDIT 2: For those who proposed the problem was about IO-boundness and maybe a slow disk, I just tried without any conviction to run my command on an SSD partition I just made and nothing has changed.

eytienne
  • 103
  • 3
  • 11
  • 1
    A dump isn't a backup, it's a batch file full of INSERT statements. Executing 30MB worth of INSERT statements will be affected by IO throughput and delays. Since you use a container, IO will be slow because the actual database files are inside the container. Increasing the size of the database will require increasing the size of the container image. This will be *at least* twice as slow as using a physical database file – Panagiotis Kanavos Mar 10 '22 at 16:06
  • @PanagiotisKanavos thanks, I didn't get your point, look at the EDIT maybe but for me, it is related to computation running too slow and the CPU not being used enough compared to others – eytienne Mar 10 '22 at 16:17
  • 1
    @PanagiotisKanavos. 1) By default a Postgres is not compromised of `INSERT`s it uses `COPY` to import data. You can force it to use `INSERT`s with `--inserts` or `--column-inserts`, but that is not what eytienne did. 2) Depending on how you specify the command a dump can include more then just data. 3) A dump can be point in time backup of an entire database or using `pg_dumpall` an entire cluster. – Adrian Klaver Mar 10 '22 at 16:30
  • That's still not a backup, it's a script. It may be called a "logical" backup but it's actually a script. That's slow and will cause the database file to increase, thus causing the image to increase. An actual, physical backup taken with `pg_basebackup` would be a lot faster to take *and* restore – Panagiotis Kanavos Mar 10 '22 at 17:58
  • 1
    @PanagiotisKanavos. Huh? You have empty database you restore the `pg_dump` file to it and you end up with the same thing in `$PGDATA` as you would with `pg_basebackup`. I have no idea what you are talking about database file increasing. – Adrian Klaver Mar 10 '22 at 19:32

2 Answers2

5

The vmstat output shows that you are I/O bound. Get faster storage, and performance will improve.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
1

PostgreSQL, by default, is tuned for data durability. Usually transactions are flushed to the disk at each and every commit, forcing write-through of any disk write cache, so it seems to be very IO-bound.

When restoring database from a dump file, it may make sense to lower these durability settings, especially if the restore is done while your application is offline, especially in non-production environments.

I temporarily run postgres with these options: -c fsync=off -c synchronous_commit=off -c full_page_writes=off -c checkpoint_flush_after=256 -c autovacuum=off -c max_wal_senders=0

Refer to these documentation sections for more information:

Also this article:

Thai
  • 10,746
  • 2
  • 45
  • 57
  • 1
    I would warn everybody *not* to follow this advice. Setting `fsync = off` is not only a sacrifice in durability, but it will break your database and cause data corruption if you have a crash. The same holds for `full_page_writes = off`. You can only change these two parameters if you are ready to restore from backup.The other parameters are safe, but `max_wal_senders` will offer no benefit, as far as I know. – Laurenz Albe May 24 '23 at 17:12