72

I have a Docker container that contains my Postgres database. It's using the official Postgres image which has a CMD entry that starts the server on the main thread.

I want to populate the database by running RUN psql –U postgres postgres < /dump/dump.sql before it starts listening to queries.

I don't understand how this is possible with Docker. If I place the RUN command after CMD, it will of course never be reached because Docker has finished reading the Dockerfile. But if I place it before the CMD, it will run before psql even exists as a process.

How can I prepopulate a Postgres database in Docker?

Michal
  • 31
  • 6
Migwell
  • 18,631
  • 21
  • 91
  • 160
  • you can start your docker Postgresql container, and then `docker exec -it container psql –U postgres postgres < /dump/dump.sql` even if this is not a perfect solution, and using supervisor seems overkill (http://docs.docker.com/articles/using_supervisord/) – user2915097 Apr 13 '15 at 09:18
  • The database has to listen to connections for it to be able to take tha data in. You could start it with configuration that only allows access on a certain IP, UNIX socket, another port, etc depending on your configuration. Then load your data and then change the configuration to normal. – Sami Kuhmonen Apr 13 '15 at 11:46
  • 2
    the postgres image has facility for adding scripts that get executed on startup in the /docker-entrypoint-initdb.d directory. You would execute postgres with --single argument to do it 'privately' before the service is officially started. There is a good example of extending the image using a start script here: https://registry.hub.docker.com/u/sameersbn/postgresql/ look at the start script, it does --single – Greg Apr 13 '15 at 14:14
  • Which start script to you mean? There are a few on the page. If there's a proper way to do this and you can show me the right section, post it as an answer and I'll mark it as correct! – Migwell Apr 13 '15 at 14:43
  • you can put any script in /docker-entrypoint-initdb.d directory. you invent the name, and the contents, of the script. your script would load the data. – Greg Apr 13 '15 at 16:49
  • `postgres --single` will fail in general with a dump file. That's because it parses the file as each line being a separate SQL statement, or the entire file being a single statement, and none of them works for a real dump. See [postgres manpage](http://www.postgresql.org/docs/current/static/app-postgres.html) – Daniel Vérité Apr 13 '15 at 17:06
  • The right answer seems to be under 'How to extend this image' in the official Postgres image (https://registry.hub.docker.com/_/postgres/), not the one you linked. – Migwell Apr 14 '15 at 04:40

9 Answers9

60

After a lot of fighting, I have found a solution ;-)

For me was very useful a comment posted here: https://registry.hub.docker.com/_/postgres/ from "justfalter"

Anyway, I have done in this way:

# Dockerfile
FROM postgres:9.4

RUN mkdir -p /tmp/psql_data/

COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/

db/structure.sql is a sql dump, useful to initialize the first tablespace.

Then, the init_docker_postgres.sh

#!/bin/bash

# this script is run when the docker container is built
# it imports the base database structure and create the database for the tests

DATABASE_NAME="db_name"
DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"

echo "*** CREATING DATABASE ***"

# create default database
gosu postgres postgres --single <<EOSQL
    CREATE DATABASE "$DATABASE_NAME";
    GRANT ALL PRIVILEGES ON DATABASE "$DATABASE_NAME" TO postgres;
EOSQL

# clean sql_dump - because I want to have a one-line command

# remove indentation
sed "s/^[ \t]*//" -i "$DB_DUMP_LOCATION"

# remove comments
sed '/^--/ d' -i "$DB_DUMP_LOCATION"

# remove new lines
sed ':a;N;$!ba;s/\n/ /g' -i "$DB_DUMP_LOCATION"

# remove other spaces
sed 's/  */ /g' -i "$DB_DUMP_LOCATION"

# remove firsts line spaces
sed 's/^ *//' -i "$DB_DUMP_LOCATION"

# append new line at the end (suggested by @Nicola Ferraro)
sed -e '$a\' -i "$DB_DUMP_LOCATION"

# import sql_dump
gosu postgres postgres --single "$DATABASE_NAME" < "$DB_DUMP_LOCATION";


echo "*** DATABASE CREATED! ***"

So finally:

# no postgres is running
[myserver]# psql -h 127.0.0.1 -U postgres
psql: could not connect to server: Connection refused
    Is the server running on host "127.0.0.1" and accepting
    TCP/IP connections on port 5432?

[myserver]# docker build -t custom_psql .
[myserver]# docker run -d --name custom_psql_running -p 5432:5432 custom_psql

[myserver]# docker ps -a
CONTAINER ID        IMAGE                COMMAND                CREATED             STATUS              PORTS                    NAMES
ce4212697372        custom_psql:latest   "/docker-entrypoint.   9 minutes ago       Up 9 minutes        0.0.0.0:5432->5432/tcp   custom_psql_running

[myserver]# psql -h 127.0.0.1 -U postgres
psql (9.2.10, server 9.4.1)
WARNING: psql version 9.2, server version 9.4.
            Some psql features might not work.
Type "help" for help.

postgres=# 

# postgres is now initialized with the dump

Hope it helps!

Ronan Boiteau
  • 9,608
  • 6
  • 34
  • 56
damoiser
  • 6,058
  • 3
  • 40
  • 66
  • 2
    I would also append a new line at the end of the .sql file, as the last statement is not executed if there is not a trailing blank line (`sed -e '$a\' -i "$DB_DUMP_LOCATION"`). – Nicola Ferraro Jun 27 '15 at 08:31
  • thanks @NicolaFerraro I didn't know about it, I add your suggestion to the response – damoiser Sep 30 '15 at 11:56
  • that didn't work for me. the container exits and database is not reachable – Hanan Shteingart Feb 01 '17 at 10:53
  • it is hard to help you @HananShteingart when you don't provide any extra info about the problem. Try to log the various steps to debug where it crashes and check your sql-dump. For me and other people work, this means that something else should be the issue for you. – damoiser Feb 01 '17 at 11:06
  • Well, to start, copying the entire data set in docker build time is less convenient as you cannot use that image when you have a fresh data. Moreover, "Sending build context to Docker daemon ..." takes ages ( i have 30GB dataset) – Hanan Shteingart Feb 01 '17 at 11:44
  • So, instead, I run the container with a shared folder (-v %CD%:/tmp/psql_data). But, when I run the container it exits immediately with Exited (1) – Hanan Shteingart Feb 01 '17 at 12:00
  • ehm... but @HananShteingart your resolution is not related how I suggested - you are using a shared folder... anyway, interact `-it ... bash` actively to your container and execute step by step the procedure to see where it fails. After that you will know where the issue is and so you can fix it – damoiser Feb 01 '17 at 12:59
  • When you mention _`db/structure.sql` is a sql dump, useful to initialize the first tablespace._ I think you can skip this by passing environment variables like `POSTGRES_USER` or `POSTGRES_DB` – diegoaguilar Mar 15 '19 at 21:03
  • @diegoaguilar this depends about your use case, if you have another running db where you can get the base schema "live" yes. If you try to start your container from a "static" dump (i.e. backup, tests-base schema,..) then `POSTGRES_USER` and `POSTGRES_DB` does not help – damoiser Mar 18 '19 at 14:11
47

For those who want to initialize a PostgreSQL DB with millions of records during the first run.

Import using *.sql dump

You can do simple sql dump and copy the dump.sql file into /docker-entrypoint-initdb.d/. The problem is speed. My dump.sql script is about 17MB (small DB - 10 tables with 100k rows in only one of them) and the initialization takes over a minute (!). That is unacceptable for local development / unit test, etc.

Import using binary dump

The solution is to make a binary PostgreSQL dump and use shell scripts initialization support. Then the same DB is initialized in about 500ms instead of 1 minute.

1. Create the dump.pgdata binary dump of a DB named "my-db" directly from within a container or your local DB

pg_dump -U postgres --format custom my-db > "dump.pgdata"

Or from host from running container (postgres-container)

docker exec postgres-container pg_dump -U postgres --format custom my-db > "dump.pgdata"

2. Create a Docker image with a given dump and initialization script

$ tree
.
├── Dockerfile
└── docker-entrypoint-initdb.d
    ├── 01-restore.sh
    ├── 02-small-updates.sql
    └── dump.pgdata
$ cat Dockerfile
FROM postgres:11

COPY ./docker-entrypoint-initdb.d/ /docker-entrypoint-initdb.d/
$ cat docker-entrypoint-initdb.d/01-restore.sh
#!/bin/bash

file="/docker-entrypoint-initdb.d/dump.pgdata"
dbname=my-db

echo "Restoring DB using $file"
pg_restore -U postgres --dbname=$dbname --verbose --single-transaction < "$file" || exit 1
$ cat docker-entrypoint-initdb.d/02-small-updates.sql
-- some updates on your DB, for example for next application version
-- this file will be executed on DB during next release
UPDATE ... ;

3. Build an image and run it

$ docker build -t db-test-img .
$ docker run -it --rm --name db-test db-test-img
Petr Újezdský
  • 1,233
  • 12
  • 13
  • 2
    This is simple and clean. Thank you! – Glen Thompson Mar 28 '19 at 00:26
  • 2
    .sql files inside docker-entrypoint-initdb.d will run on their own (at least once). Note from the docker postgres image: "Warning: scripts in /docker-entrypoint-initdb.d are only run if you start the container with a data directory that is empty; any pre-existing database will be left untouched on container startup. One common problem is that if one of your /docker-entrypoint-initdb.d scripts fails (which will cause the entrypoint script to exit) and your orchestrator restarts the container with the already initialized data directory, it will not continue on with your scripts." – Jordan Sep 04 '19 at 20:17
  • What's the goal of having `|| exit 1` in `docker-entrypoint-initdb.d/01-restore.sh`? I have a pretty good understanding of bash's `||`, but when I try this answer out in my environment, the Postgres container exits prematurely, on account of the `exit 1`. – wgj Jul 19 '21 at 22:12
  • @wgj the command `pg_restore` did not finished with `0` exit code - it failed. I guess there could be something like `|| exit $?` to forward the exit code to the container. – Petr Újezdský Jul 20 '21 at 08:43
  • @PetrÚjezdský Sure, but having an non-zero return code is not the same as explicitly calling `exit 1`; When using `|| exit 1`, the container terminates abruptly, before Postgres starts accepting connections. I'm asking what the intended goal is because I'm unclear why you'd want `|| exit 1`. From my perspective it acts like a bug. Thank you. :) – wgj Jul 20 '21 at 15:48
29

Alternatively, you can just mount a volume to /docker-entrypoint-initdb.d/ that contains all your DDL scripts. You can put in *.sh, *.sql, or *.sql.gz files and it will take care of executing those on start-up.

e.g. (assuming you have your scripts in /tmp/my_scripts)

docker run -v /tmp/my_scripts:/docker-entrypoint-initdb.d postgres
darthbinamira
  • 524
  • 4
  • 9
  • 3
    Thanks! I would say it really depends on your use case. But for production environments you're better off mounting from a different location, say NFS or host maybe? Otherwise all your data will get wiped off once you destroy the container. – darthbinamira Jun 14 '17 at 11:27
  • 4
    Since I wrote my newbie comment I learnt/realized that the "entrypoint" directory gets read by Postgres only when Postgres is being "installed", which (of course) happens only once for each container. Starting or stopping the container does not make its Postgres scanning the directory again. (And I am going to delete that msleading comment now. :) – Jaroslav Záruba Jun 14 '17 at 19:07
2

There is yet another option available that utilises Flocker:

Flocker is a container data volume manager that is designed to allow databases like PostgreSQL to easily run in containers in production. When running a database in production, you have to think about things like recovering from host failure. Flocker provides tools for managing data volumes across a cluster of machines like you have in a production environment. For example, as a Postgres container is scheduled between hosts in response to server failure, Flocker can automatically move its associated data volume between hosts at the same time. This means that when your Postgres container starts up on a new host, it has its data. This operation can be accomplished manually using the Flocker API or CLI, or automatically by a container orchestration tool that Flocker is integrates with, for example Docker Swarm, Kubernetes or Mesos.

Max Desiatov
  • 5,087
  • 3
  • 48
  • 56
2

I Followed the same solution which @damoiser , The only situation which was different was I wanted to import all dump data.

Please follow the solution below.(I have not done any kind of checks)

Dockerfile

FROM postgres:9.5

RUN mkdir -p /tmp/psql_data/

COPY db/structure.sql /tmp/psql_data/
COPY scripts/init_docker_postgres.sh /docker-entrypoint-initdb.d/

then the init_docker_postgres.sh script

#!/bin/bash

DB_DUMP_LOCATION="/tmp/psql_data/structure.sql"

echo "*** CREATING DATABASE ***"

psql -U postgres < "$DB_DUMP_LOCATION";

echo "*** DATABASE CREATED! ***"

and then you can build your image as

docker build -t abhije***/postgres-data .

docker run -d abhije***/postgres-data 
Sreekant Shenoy
  • 1,420
  • 14
  • 23
Abhijeet Kamble
  • 3,131
  • 2
  • 30
  • 36
1

My solution is inspired by Alex Dguez's answer which unfortunately doesn't work for me because:

  1. I used pg-9.6 base image, and the RUN /docker-entrypoint.sh --help never ran through for me, which always complained with The command '/bin/sh -c /docker-entrypoint.sh -' returned a non-zero code: 1
  2. I don't want to pollute the /docker-entrypoint-initdb.d dir

The following answer is originally from my reply in another post: https://stackoverflow.com/a/59303962/4440427. It should be noted that the solution is for restoring from a binary dump instead of from a plain SQL as asked by the OP. But it can be modified slightly to adapt to the plain SQL case

Dockerfile:

FROM postgres:9.6.16-alpine

LABEL maintainer="lu@cobrainer.com"
LABEL org="Cobrainer GmbH"

ARG PG_POSTGRES_PWD=postgres
ARG DBUSER=someuser
ARG DBUSER_PWD=P@ssw0rd
ARG DBNAME=sampledb
ARG DB_DUMP_FILE=example.pg

ENV POSTGRES_DB launchpad
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD ${PG_POSTGRES_PWD}
ENV PGDATA /pgdata

COPY wait-for-pg-isready.sh /tmp/wait-for-pg-isready.sh
COPY ${DB_DUMP_FILE} /tmp/pgdump.pg

RUN set -e && \
    nohup bash -c "docker-entrypoint.sh postgres &" && \
    /tmp/wait-for-pg-isready.sh && \
    psql -U postgres -c "CREATE USER ${DBUSER} WITH SUPERUSER CREATEDB CREATEROLE ENCRYPTED PASSWORD '${DBUSER_PWD}';" && \
    psql -U ${DBUSER} -d ${POSTGRES_DB} -c "CREATE DATABASE ${DBNAME} TEMPLATE template0;" && \
    pg_restore -v --no-owner --role=${DBUSER} --exit-on-error -U ${DBUSER} -d ${DBNAME} /tmp/pgdump.pg && \
    psql -U postgres -c "ALTER USER ${DBUSER} WITH NOSUPERUSER;" && \
    rm -rf /tmp/pgdump.pg

HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD pg_isready -U postgres -d launchpad

where the wait-for-pg-isready.sh is:

#!/bin/bash
set -e

get_non_lo_ip() {
  local _ip _non_lo_ip _line _nl=$'\n'
  while IFS=$': \t' read -a _line ;do
    [ -z "${_line%inet}" ] &&
        _ip=${_line[${#_line[1]}>4?1:2]} &&
        [ "${_ip#127.0.0.1}" ] && _non_lo_ip=$_ip
    done< <(LANG=C /sbin/ifconfig)
  printf ${1+-v} $1 "%s${_nl:0:$[${#1}>0?0:1]}" $_non_lo_ip
}

get_non_lo_ip NON_LO_IP
until pg_isready -h $NON_LO_IP -U "postgres" -d "launchpad"; do
  >&2 echo "Postgres is not ready - sleeping..."
  sleep 4
done

>&2 echo "Postgres is up - you can execute commands now"

The above scripts together with a more detailed README are available at https://github.com/cobrainer/pg-docker-with-restored-db

Lu Liu
  • 78
  • 1
  • 6
0

I was able to load the data in by pre-pending the run command in the docker file with /etc/init.d/postgresql. My docker file has the following line which is working for me:

RUN /etc/init.d/postgresql start && /usr/bin/psql -a < /tmp/dump.sql
acknapp
  • 125
  • 2
  • 8
  • tried with docker image postgres:9.6 I receive following error: `No PostgreSQL clusters exist; see "man pg_createcluster" ... (warning).` – gelonida Oct 27 '21 at 15:43
0

We for E2E test in which we need a database with structure and data already saved in the Docker image we have done the following:

Dockerfile:

FROM postgres:9.4.24-alpine
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD postgres
ENV PGDATA /pgdata
COPY database.backup /tmp/
COPY database_restore.sh /docker-entrypoint-initdb.d/
RUN /docker-entrypoint.sh --help
RUN rm -rf /docker-entrypoint-initdb.d/database_restore.sh
RUN rm -rf /tmp/database.backup

database_restore.sh:

#!/bin/sh
set -e 
pg_restore -C -d postgres /tmp/database.backup

To create the image:

docker build .

To start the container:

docker run --name docker-postgres -d -p 5432:5432 <Id-docker-image>

This does not restore the database every time the container is booted. The structure and data of the database is already contained in the created Docker image.

We have based on this article, but eliminating the multistage: Creating Fast, Lightweight Testing Databases in Docker

Edit: With version 9.4-alpine does not work now because it does not run the database_restore.sh scrips. Use version 9.4.24-alpine

0

My goal was to have an image that contains the database - i. e. saving the time to rebuild it everytime I do docker run oder docker-compose up.

We would just have to manage to get the line exec "$@" out of docker-entrypoint.sh. So I added into my Dockerfile:

#Copy my ssql scripts into the image to /docker-entrypoint-initdb.d:
COPY ./init_db /docker-entrypoint-initdb.d

#init db
RUN grep -v 'exec "$@"' /usr/local/bin/docker-entrypoint.sh > /tmp/docker-entrypoint-without-serverstart.sh && \
    chmod a+x /tmp/docker-entrypoint-without-serverstart.sh && \
    /tmp/docker-entrypoint-without-serverstart.sh postgres && \
    rm -rf /docker-entrypoint-initdb.d/* /tmp/docker-entrypoint-without-serverstart.sh
fjf2002
  • 872
  • 5
  • 15