The accepted answer in this related post asks to remove the docker volume which means that all the data in the PostgreSQL instance will be lost!!!
I recently encountered the same issue in an environment where we had months of stored data. So, removing/deleting the volume was a (very very) last resort.
I came across this answer which asks to reset the transaction write-ahead log (WAL). This works in a scenario where the DB is installed as an application and not run as a container. In case of docker, however, the container itself fails to deploy as a result of the following error
PANIC: could not locate a valid checkpoint record
Solution
The solution in this case is to use docker run
command with the postgres image and mount the same physical volume on it at the correct PGDATA path (that you had configured earlier) and directly login to the bash shell.
docker run -it -v /data/postgres_data:<PGDATA_path> postgres:14 /bin/bash
The <PGDATA_path>
is /var/lib/postgresql/data
by default if you haven't altered it.
If you were using docker volumes, you need to figure out your docker volume name from the list of volumes: docker volume ls
. Obtain the volume name and then use docker run
like below.
docker run -it -v <docker_volume_name>:<PGDATA_path> postgres:14 /bin/bash
Once you have the container's shell, you can now reset the WAL using
# Postgres >= 10
pg_resetwal <PGDATA_path>
# Postgres < 10
pg_resetxlog <PGDATA_path>
Use the -f
option if required. Refer to PostgreSQL Documentation on pg_resetwal
for further details on the same.
Now, exit the container's shell and redeploy the container using docker-compose
/docker stack
, whichever you're using, and you'll have successfully recovered your DB while retaining all the data.
Notes
- Some data may still be lost, especially at the places where there was no consensus on WAL.
- Some db-dependent applications may throw further errors like "The database log was reset, data inserted was not found". In such cases, you will have to reset the write-heads of those applications.
- Visit my answer here to get an understanding of when/why this happens.