10

Current situation

So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.

The directory structure is like:

D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)

D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup

The question I have then is:

  • Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)

cheers!

undercurrent
  • 285
  • 1
  • 2
  • 12

5 Answers5

18

WAL archives

You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.

In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.

pg_xlog

Never remove WAL from pg_xlog manually. If you have too much WAL then:

  • your wal_keep_segments setting is keeping WAL around;
  • you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
  • your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
  • you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.

You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.

If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
10

WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.

The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.

Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.

The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.

Patrick
  • 29,357
  • 6
  • 62
  • 90
  • When you say you can't throw any files out, do you mean "if you delete some of the wal files, you will be unable to recover the database" ? I intend to keep all the wal files but store them in a .zip. A recovery procedure would then require a larger RAID array to hold all the WAL files and the original basebackup. Is that valid? I would have though postgres merely requires the previous wal file in order to create the next one (consecutive files incrementally named/linked with each other). Would the archiving suddenly malfunction if i moved all the previous backup files to an external drive? – undercurrent Feb 02 '16 at 03:43
  • See updated answer. It is not just an issue of how the WALs are collected, but especially the practicality of recovering from a disaster in the least amount of time. – Patrick Feb 02 '16 at 05:04
  • cheers for the updated info. That approach sounds good to me! – undercurrent Feb 03 '16 at 00:47
7

A solution that we went for, is executing pg_basebackup every night. This would create a base backup and later on we can use pg_archivecleanup to clean up all the "old" WAL files before that base using something like

"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%

Fortunately, we never had to recover yet, but it should work in theory.

Viorel
  • 337
  • 2
  • 12
3

In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offline replicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.

In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_slot on the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.

Aldo
  • 1,677
  • 1
  • 12
  • 15
1
You can add the script to automatically clean or remove pg_wal files. This will work in pg-11 version. If you want to use other psql version the you can simply replace the command "/usr/pgsql-11/bin/pg_archivecleanup" to /usr/pgsql-12/bin/pg_archivecleanup or 13 as per your wish. 

#!/bin/bash

/usr/pgsql-11/bin/pg_controldata -D /var/lib/pgsql/11/data/ > pgwalfile.txt


/usr/pgsql-11/bin/pg_archivecleanup -d /var/lib/pgsql/11/data/pg_wal  $(cat pgwalfile.txt | grep "Latest checkpoint's REDO WAL file" | awk '{print $6}')
  • One liner style: pg_archivecleanup -d /var/lib/postgresql/data/pgdata/pg_wal $(pg_controldata | grep "Latest checkpoint's REDO WAL" | cut -d: -f2 | tr -d [:space:]) – Alain Pannetier Aug 22 '22 at 14:13
  • Can I remove the archived WALs by using these steps : First, issue `CHECKPOINT`. Then run `pg_dump`. Lastly, find WALs files on the /Archived directory that are older than whatever WALs resulted from `CHECKPOINT` command. Is this even possible ? – padjee Oct 01 '22 at 17:02