5

I made daily backups of a postgresql DB using the command

/usr/bin/pg_basebackup -D $outdir -Ft -x -z -w -R -v

Now I want to restore this DB on another server. I used the description on https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-PITR-RECOVERY.

The recovery.conf file included in the backup has the following contents:

standby_mode = 'on'
primary_conninfo = 'user=postgres port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

The next step (8.) in the documentation says to start postgresql. This results in a failure due to a timeout:

3783 postgres: startup process   waiting for 0000000100000024000000B 

On the original server I don't have this file. Is it possible to restore only the state of the pg_basebackup without using any WAL files? What should then be in the recovery.conf file?

Following the suggestion by @JosMac I moved the recovery.conf with this result:

shaun2:/var/lib/pgsql/data # service postgresql start
Job for postgresql.service failed because the control process exited with error code. See "systemctl status postgresql.service" and "journalctl -xe" for details.
shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2018-06-18 12:02:53 CEST; 12s ago
  Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
  Process: 9355 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
 Main PID: 1060 (code=exited, status=0/SUCCESS)

Jun 18 12:02:52 shaun2 postgres[9369]: [3-1] 2018-06-18 12:02:52 CEST   LOG:  invalid checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-1] 2018-06-18 12:02:52 CEST   FATAL:  could not locate required checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-2] 2018-06-18 12:02:52 CEST   HINT:  If you are not restoring from a backup, try removing the file "/var/lib/pgsql/data/backup_label".
Jun 18 12:02:52 shaun2 postgres[9367]: [2-1] 2018-06-18 12:02:52 CEST   LOG:  startup process (PID 9369) exited with exit code 1
Jun 18 12:02:52 shaun2 postgres[9367]: [3-1] 2018-06-18 12:02:52 CEST   LOG:  aborting startup due to startup process failure
Jun 18 12:02:53 shaun2 postgresql-init[9355]: pg_ctl: could not start server
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Control process exited, code=exited status=1
Jun 18 12:02:53 shaun2 systemd[1]: Failed to start PostgreSQL database server.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Unit entered failed state.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Failed with result 'exit-code'.

I suppose that PostgreSQL is still looking for the missing WAL file because of the contents of backup_label:

shaun2:/var/lib/pgsql/data # cat backup_label
START WAL LOCATION: 24/B0000028 (file 0000000100000024000000B0)
CHECKPOINT LOCATION: 24/B0000028
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-06-14 02:55:08 CEST
LABEL: pg_basebackup base backup

Result after moving backup_label away:

shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2018-06-18 12:17:54 CEST; 4s ago
  Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
  Process: 10401 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
 Main PID: 1060 (code=exited, status=0/SUCCESS)

Jun 18 12:17:53 shaun2 postgres[10414]: [4-1] 2018-06-18 12:17:53 CEST   LOG:  invalid secondary checkpoint record
Jun 18 12:17:53 shaun2 postgres[10414]: [5-1] 2018-06-18 12:17:53 CEST   PANIC:  could not locate a valid checkpoint record
Jun 18 12:17:54 shaun2 postgres[10412]: [2-1] 2018-06-18 12:17:54 CEST   LOG:  startup process (PID 10414) was terminated by signal 6: Aborted
muclux
  • 163
  • 1
  • 1
  • 6

2 Answers2

8

We use pg_basebackup for backups and also did several restorations so generally it works very well without problems.

But I would recommend you to use parameter -X stream instead of -x (meaning "-X fetch"). With this parameter pg_basebackup will catch and store WAL log segments created during the time of backup together with data files. These WAL logs will be stored in separate pg_xlog.tar or pg_wal.tar files (depending on PG version).

Full description of restoration can be find here - pg_basebackup / pg-barman – restore tar backup

mustaccio
  • 18,234
  • 16
  • 48
  • 57
JosMac
  • 2,164
  • 1
  • 17
  • 23
  • Unfortunately I want to restore a backup file from 2 days ago, that was made with the parameter `-x` and not `-X stream`. I don't have neither a `pg_xlog.tar` nor a `pg_wal.tar`. The description in your link does presume the existence of one of these files in the `recovery.conf` file. – muclux Jun 15 '18 at 13:30
  • Well theoretically - if there were no transactions during basebackup run you could try to start restored backup without recovery file. But I doubt that... If you have archive_mode ON or Always on your DB and archive_command set or wal_keep_segments really high you can copy WAL logs from your DB. If not I would make another basebackup with new parameters... – JosMac Jun 15 '18 at 16:14
  • I need the data from last Wednesday, so a new backup does not help. The backup is made 3am, and there is no activity at this time. So the data in the base backup would be accurate enough - if I could restore them. The current WAL files begin today in the morning, so they are not of any help. – muclux Jun 15 '18 at 16:40
  • OK, if you are sure there was no activity stop postgresql service (if it is running), rename recovery.conf file and try to start postgresql service again and watch messages in /var/log/postgresql/postgresql-xxx-main.log – JosMac Jun 18 '18 at 08:02
  • I see - look at the message `HINT: If you are not restoring from a backup, try removing the file "/var/lib/pgsql/data/backup_label"` - try to rename this file and try it again. – JosMac Jun 18 '18 at 10:50
  • I edited once adding the result without backup_label file. – muclux Jun 18 '18 at 11:01
  • I see - there is a possibility to reset checkpoint - see here - https://stackoverflow.com/questions/8799474/postgresql-error-panic-could-not-locate-a-valid-checkpoint-record – JosMac Jun 18 '18 at 11:29
  • 1
    Actually we have got a step further. We replayed the tar decompression and had the one WAL file that was needed. Unfortunately we still can't start PostgreSQL, because of 'FATAL: could not connect to the primary server'. I suppose that this is due to the fact that we installed the backup on another server. I'll ask another question for this part. – muclux Jun 18 '18 at 12:26
  • 1
    We managed now to restore the data on the second server by returning to your first suggestion: delete the recovery.conf (but this time with the requested WAL file - and it worked. Thank you very much for your help and patience! – muclux Jun 18 '18 at 15:05
0

The -R option generates a recovery.conf file that is useful if the backup will be used in replica servers, because it sets the server in standby_mode and it also has the primary_conninfo to pull data from the primary.

So, if you just want to make/restore backups, I wouldn't use -R. Just in case it helps, I used these options: -v -P -x -F tar -z.

To restore the backup, unzip it to the proper directory (e.g. /var/lib/postgresql/$VERSION/main), create an empty recovery.conf file there (or clear the one you have, but better don't use -R), and start the server.

Ferran Maylinch
  • 10,919
  • 16
  • 85
  • 100