52

I'm looking to build dockerfiles that represent company databases that already exist. Similarly, I'd like create a docker file that starts by restoring a psql dump.

I have my psql_dump.sql in the . directory.

FROM postgres
ADD . /init_data
run "createdb" "--template=template0" "my_database"
run  "psql" "-d" "my_database"  --command="create role my_admin superuser"
run  "psql" "my_database" "<" "init_data/psql_dump.sql"

I thought this would be good enough to do it. I'd like to avoid solutions that use a .sh script. Like this solution.

I use template0 since the psql documentation says you need the same users created that were in the original database, and you need to create the database with template0 before you restore.

However, it gives me an error:

createdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting

I'm also using docker compose for the overall application, if solving this problem in docker-compose is better, I'd be happy to use the base psql image and use docker compose to do this.

Community
  • 1
  • 1
Jono
  • 3,393
  • 6
  • 33
  • 48
  • Possible duplicate of [Starting and populating a Postgres container in Docker](http://stackoverflow.com/questions/29600369/starting-and-populating-a-postgres-container-in-docker) – damoiser Feb 01 '17 at 11:20

3 Answers3

76

According to the usage guide for the official PostreSQL Docker image, all you need is:

Dockerfile

FROM postgres
ENV POSTGRES_DB my_database
COPY psql_dump.sql /docker-entrypoint-initdb.d/

The POSTGRES_DB environment variable will instruct the container to create a my_database schema on first run.

And any .sql file found in the /docker-entrypoint-initdb.d/ of the container will be executed.

If you want to execute .sh scripts, you can also provide them in the /docker-entrypoint-initdb.d/ directory.

Thomasleveil
  • 95,867
  • 15
  • 119
  • 113
  • Awesome, thanks! I also wasn't realizing that I needed to enter psql with `psql --username=my_user --dbname=my_database` that made it work the way I expected. – Jono Jan 13 '16 at 00:21
  • 10
    The subject asks for "Build postgres docker container with initial schema". But what you suggest creates the schema on startup, rather than building a container that already has that schema. If the schema takes 5 minutes to create, that's a 5 minute penalty each time it has to create a volume, rather than having it inside the actual container. – dsvensson Sep 13 '17 at 12:08
  • 1
    Having your data inside the image (so you can have it at container creation) is a bad idea since the docker image union file system is slow. Your data should be on a volume, thus out of the container filesystem / image filesystem – Thomasleveil Sep 13 '17 at 12:47
  • 1
    @Thomasleveil you can do it in 3 steps: first create a container as you did, then copy its data to a local host directory, and then each time you want a fresh PostreSQL, start a PostreSQL container which it's volume is working with a copy of this local host directory. – yishaiz Nov 27 '17 at 14:07
9

As said in the comments, @Thomasleveil answer is great and simple if your schema recreation is fast. But in my case it's slow, and I wanted to use docker volumes, so here is what I did

  1. First use docker image as in @Thomasleveil answer to create a container with postgres with all the schema initialization

Dockerfile:

FROM postgres
WORKDIR /docker-entrypoint-initdb.d
ADD psql_dump.sql /docker-entrypoint-initdb.d
EXPOSE 5432
  1. then run it and create new local dir which contains the postgres data after its populated from the “psql_dump.sql” file: docker cp mypg:/var/lib/postgresql/data ./postgres-data

  2. Copy the data to a temp data folder, and start a new postgres docker-compose container whose volume is at the new temp data folder:

startPostgres.sh:

rm -r ./temp-postgres-data/data
mkdir -p ./temp-postgres-data/data
cp -r ./postgres-data/data ./temp-postgres-data/
docker-compose -p mini-postgres-project up

and the docker-compose.yml file is:

version: '3'
services:
  postgres:
    container_name: mini-postgres
    image: postgres:9.5
    ports:
    - "5432:5432"
    volumes:
      - ./temp-postgres-data/data:/var/lib/postgresql/data

Now you can run steps #1 and #2 on a new machine or if your psql_dump.sql changes. And each time you want a new clean (but already initialized) db, you can only run startPostgres.sh from step #3. And it still uses docker volumes.

Arialdo Martini
  • 4,427
  • 3
  • 31
  • 42
yishaiz
  • 2,433
  • 4
  • 28
  • 49
6

@Thomasleveil's answer will re-create the database schema at runtime, which is fine for most cases.

If you want to recreate the database schema at buildtime (i.e. if your schema initialization is really slow) you can invoke the stock docker_entrypoint.sh from within your Dockerfile.

However, since the docker_entrypoint.sh is designed to start a long-running database server, you have to add an extra script to exit the process after database initialization but before booting the long-running server.

Dockerfile (with build time database initialization)

# STAGE 1 - Equivalent to @Thomasleveil
FROM postgres AS runtime_init
ENV POSTGRES_DB my_database
COPY 1-psql_dump.sql /docker-entrypoint-initdb.d/

# STAGE 2 - Initialize the database during the build
FROM runtime_init AS buildtime_init_builder
RUN echo "exit 0" > /docker-entrypoint-initdb.d/100-exit_before_boot.sh
ENV PGDATA=/pgdata
RUN docker-entrypoint.sh postgres

# STAGE 3 - Copy the initialized db to a new image to reduce size.
FROM postgres AS buildtime_init
ENV PGDATA=/pgdata
COPY --chown=postgres:postgres --from=buildtime_init_builder /pgdata /pgdata

Important Notes

  • The stock postgres image will run initialization scripts in alphabetical order, so ensure that your database restoration scripts appear earlier than the exit_before_boot.sh script created in the Dockerfile.
    • This is demonstrated by the 1 and 100 prefixes shown above. Modify them to your liking.
  • Database updates to a running instance of this image will not be persisted across reboots since the PGDATA path where the database files are stored no longer maps to a volume mounted from the host machine.

Further Reading

thehale
  • 913
  • 10
  • 18
  • This is super helpful. As a note, one pattern I use to order files like this is 001, 100, 999, etc. It protects against alphabetic sorting going awry. 100 is sorted before 2, for instance. Also, `PGDATA` was not being respected for me, so I had to use the default `/var/lib/postgresql/data/` – cole Dec 30 '22 at 17:02