How to restore a Postgresdump while building a Docker image?

Question

I'm trying to avoid touching a shared dev database in my workflow; to make this easier, I want to have Docker image definitions on my disk for the schemas I need. I'm stuck however at making a Dockerfile that will create a Postgres image with the dump already restored. My problem is that while the Docker image is being built, the Postgres server isn't running.

While messing around in the container in a shell, I tried starting the container manually, but I'm not sure what the proper way to do so. /docker-entrypoint.sh doesn't seem to do anything, and I can't figure out how to "correctly" start the server.

So what I need to do is:

start with "FROM postgres"
copy the dump file into the container
start the PG server
run psql to restore the dump file
kill the PG server

(Steps I don't know are in italics, the rest is easy.)

What I'd like to avoid is:

Running the restore manually into an existing container, the whole idea is to be able to switch between different databases without having to touch the application config.
Saving the restored image, I'd like to be able to rebuild the image for a database easily with a different dump. (Also it doesn't feel very Docker to have unrepeatable image builds.)

pg_dump will not do it because - as you already mentioned - postgresql is not running when you build image. You could try to clone postgresql datafiles. Generally make backup (not in tar format) with pg_basebackup and copy whole structure into image. This way you will get consistent snapshot to the some point in time. During start of image PG will do "recovery" from stored WAL files catched during backup. But having data inside image is useful only when you intend to have real only data snapshot... — JosMac, Feb 06 '18 at 13:21
I will be storing the data in a volume, but since this is for dev purposes it doesn’t really matter. I’m really mainly interested in being able to go from docker-compose.yml+Dockerfiles+dumpfile to running db with as few steps as possible, and having as many of the input files suitable for version control. — millimoose, Feb 06 '18 at 13:51

score 5 · Accepted Answer · answered Dec 12 '19 at 11:51

This can be done with the following Dockerfile by providing an example.pg dump file:

FROM postgres:9.6.16-alpine

LABEL maintainer="lu@cobrainer.com"
LABEL org="Cobrainer GmbH"

ARG PG_POSTGRES_PWD=postgres
ARG DBUSER=someuser
ARG DBUSER_PWD=P@ssw0rd
ARG DBNAME=sampledb
ARG DB_DUMP_FILE=example.pg

ENV POSTGRES_DB launchpad
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD ${PG_POSTGRES_PWD}
ENV PGDATA /pgdata

COPY wait-for-pg-isready.sh /tmp/wait-for-pg-isready.sh
COPY ${DB_DUMP_FILE} /tmp/pgdump.pg

RUN set -e && \
    nohup bash -c "docker-entrypoint.sh postgres &" && \
    /tmp/wait-for-pg-isready.sh && \
    psql -U postgres -c "CREATE USER ${DBUSER} WITH SUPERUSER CREATEDB CREATEROLE ENCRYPTED PASSWORD '${DBUSER_PWD}';" && \
    psql -U ${DBUSER} -d ${POSTGRES_DB} -c "CREATE DATABASE ${DBNAME} TEMPLATE template0;" && \
    pg_restore -v --no-owner --role=${DBUSER} --exit-on-error -U ${DBUSER} -d ${DBNAME} /tmp/pgdump.pg && \
    psql -U postgres -c "ALTER USER ${DBUSER} WITH NOSUPERUSER;" && \
    rm -rf /tmp/pgdump.pg

HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD pg_isready -U postgres -d launchpad

where the wait-for-pg-isready.sh is:

#!/bin/bash
set -e

get_non_lo_ip() {
  local _ip _non_lo_ip _line _nl=$'\n'
  while IFS=$': \t' read -a _line ;do
    [ -z "${_line%inet}" ] &&
        _ip=${_line[${#_line[1]}>4?1:2]} &&
        [ "${_ip#127.0.0.1}" ] && _non_lo_ip=$_ip
    done< <(LANG=C /sbin/ifconfig)
  printf ${1+-v} $1 "%s${_nl:0:$[${#1}>0?0:1]}" $_non_lo_ip
}

get_non_lo_ip NON_LO_IP
until pg_isready -h $NON_LO_IP -U "postgres" -d "launchpad"; do
  >&2 echo "Postgres is not ready - sleeping..."
  sleep 4
done

>&2 echo "Postgres is up - you can execute commands now"

For the two "unsure steps":

start the PG server

nohup bash -c "docker-entrypoint.sh postgres &" can take care of it

kill the PG server

It's not really necessary

The above scripts together with a more detailed README are available at https://github.com/cobrainer/pg-docker-with-restored-db

Very interesting repository. I have appreciated your work. I think that using multi-stage builds one can improve the process by actually/simply zipping the entire datadir to a directory within the image, which will be restored at runtime — usr-local-ΕΨΗΕΛΩΝ, Apr 08 '20 at 14:07
How can I stop the `wait-for-pg-isready.sh` from executing after the migration step? When I run the build in Docker, this script freezes the entire build. Other containers do not get built because of that. — MadPhysicist, Oct 09 '22 at 22:00
@Chris sorry about the inconveniences. The link is working again — Lu Liu, Feb 03 '23 at 08:58

score 0 · Answer 2 · answered Feb 06 '18 at 13:29

0

You can utilise volumes.

The postgres image has an enviroment variable you could set with: PGDATA

See docs: https://hub.docker.com/_/postgres/

You could then point a pre created volume with the exact db data that you require and pass this as an argument to the image. https://docs.docker.com/storage/volumes/#start-a-container-with-a-volume

Alternate solution can also be found here: Starting and populating a Postgres container in Docker

answered Feb 06 '18 at 13:29

Pandelis

1,854
13
20

Yeah, using volumes goes against the spirit of thing 2 I want to avoid and the whole question to begin with: I want to create the PG data files during the build, not have them built separately. – millimoose Feb 06 '18 at 13:49
The linked answer seems promising though and it seems that it might be a dupe for my post, I just couldn’t really find it since it used a different phrasing. – millimoose Feb 06 '18 at 13:53

score 0 · Answer 3 · answered Jul 31 '20 at 15:08

A general approach to this that should work for any system that you want to initialize I remember using on other projects is:

Instead of trying to do do this during the build, use Docker Compose dependencies so that you end up with:

your db service that fires up the database without any initialization that requires it to be live
a db-init service that:
- takes a dependency on db
- waits for the database to come up using say dockerize
- then initializes the database while maintaining idempotency (e.g. using schema migration)
- and exits
your application services that now depend on db-init instead of db

How to restore a Postgresdump while building a Docker image?

3 Answers3

Linked