I am in the process of experimenting/tinkering/learning/breaking with Docker. I am currently writing Docker code to create a snapshotted testing environment for my application.
By snapshotted
I mean that my database is reset on purpose on every restart, so that I can work with old data at a certain time. What is peculiar in my case is that I want to populate a Postgresql database at build time, not at start time. Postgresql image is ready for populating the db with sql scripts at container start, but it takes hours.
My application is made by a Tomcat 8.5 server running my WAR and a Postgresql database, which is the focus of my question now. I am creating a Gist while I write for full code.
The code I have done
I have followed a tutorial on how to build a Docker image of Postgres with a full database, rather than have Postgres populate itself on boot. This because I have a million record database and only a .sql.gz
dump that sysop gave me.
So the relevant parts of the Dockerfile are
WORKDIR /opt/setup/
COPY db-setup.sh /opt/setup/
COPY db-pack.sh /opt/setup/
COPY db-run.sh /opt/setup/
RUN ./db-setup.sh
RUN ./db-pack.sh
#VOLUME $PGDATA (Note it is commented out, now)
EXPOSE 5432
The db-setup.sh
is run on image build, and picks files from data-scripts.d
. Of course I am not allowed to share the contents of the dump, but it's a plain .sql.gz with plenties of OID
s that take a huge amount of time to restore. The db-setup.sh
shown in Gist is derived from both the tutorial and the original Postgres image so that it handles correctly the compression (the tutorial only uses plain SQL)
Build succeeds, startup fails
When I build the image, it takes considerable amount of time to load the data, which is what I want
2019-08-07 07:57:04.149 UTC [49] LOG: database system was shut down at 2019-08-07 07:57:03 UTC
2019-08-07 07:57:04.231 UTC [48] LOG: database system is ready to accept connections
done
server started
./db-setup.sh: running methodinv_pcp3.sql.gz
2019-08-07 08:49:52.052 UTC [117] ERROR: canceling autovacuum task
2019-08-07 08:49:52.052 UTC [117] CONTEXT: automatic analyze of table "postgres.public.ftt_interactive_data_492"
2019-08-07 08:49:59.086 UTC [118] ERROR: canceling autovacuum task
2019-08-07 08:49:59.086 UTC [118] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:50:34.086 UTC [118] ERROR: canceling autovacuum task
2019-08-07 08:50:34.086 UTC [118] CONTEXT: automatic analyze of table "postgres.public.ftt_validation_492"
2019-08-07 08:51:11.889 UTC [119] ERROR: canceling autovacuum task
2019-08-07 08:51:11.889 UTC [119] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:54:21.131 UTC [123] ERROR: canceling autovacuum task
2019-08-07 08:54:21.131 UTC [123] CONTEXT: automatic analyze of table "postgres.public.ftt_oper_492"
waiting for server to shut down...2019-08-07 08:54:28.652 UTC [48] LOG: received fast shutdown request
.2019-08-07 08:54:28.797 UTC [48] LOG: aborting any active transactions
2019-08-07 08:54:28.799 UTC [48] LOG: worker process: logical replication launcher (PID 55) exited with exit code 1
2019-08-07 08:54:28.800 UTC [50] LOG: shutting down
..2019-08-07 08:54:31.407 UTC [48] LOG: database system is shut down
done
When I run the image with docker run
, startup fails because it can't find Postgres configuration
D:\IdeaProjects\pcp\ftt-containers\ftt-db-method>docker run -p 5432:5432 -l ftt-db-method ftt-db-method:latest
Restoring /var/lib/postgresql/data ...
Done.
Launching command: postgres ...
postgres: could not access the server configuration file "/var/lib/postgresql/data/postgresql.conf": No such file or directory
Originally, my Dockerfile exposed a VOLUME
which is now commented out. The above output occurs both when I declare a volume (which is not exactly what I want, I am new to Docker and copied&pasted on first chance) and when I comment the volume out.
Question
What is wrong with the Docker image of Postgres fully loaded with s**tloads of data I am experimenting? How can I effectively start Postgres with an already full database that will not (necessarily) survive container restarts?
Edit 1
By bash
-ing into the container I have found that the data dump created during build time is 10K, so basically empty.
This doesn't solve my problem yet, but answers why Postgres is unable to find its beloved data dir
Edit 2
I was able to bash into a temporary container, in particular between the moment the database is restored and the data lib is packed.
Basically the Dockerfile does
RUN ./db-setup.sh
Which executes the restore of the sql
echo "$0: running $f"; gunzip -c "$f" | "${psql[@]}" > /dev/null 2>&1 ; echo ;;
The output is saved to a temporary container. Now Dockerfile does
RUN ./db-pack.sh
Which tar
s /var/lib/postgresql/data
into /zdata
. I have
2019-08-07 16:43:51.532 UTC [42] LOG: received fast shutdown request
waiting for server to shut down....2019-08-07 16:43:51.676 UTC [42] LOG: aborting any active transactions
2019-08-07 16:43:51.679 UTC [42] LOG: worker process: logical replication launcher (PID 49) exited with exit code 1
2019-08-07 16:43:51.681 UTC [44] LOG: shutting down
...2019-08-07 16:43:54.952 UTC [42] LOG: database system is shut down
done
server stopped
Removing intermediate container 8dbe2a4e776a
---> 263896b905ce
Step 15/19 : RUN ./db-pack.sh
---> Running in 56132ecb90cc
Packing data folder: /var/lib/postgresql/data
Pack & clean finished successfully.
Removing intermediate container 56132ecb90cc
---> 1a7f8d68e8df
Step 16/19 : VOLUME $PGDATA
---> Running in 10d222beed81
Removing intermediate container 10d222beed81
---> e1a9355882d1
So I tagged 263896b905ce
(YHMV if you replicate on your pc) into a new image, then executed bash on it. The data dir was empty, the script would have packed nothing
docker tag 263896b905ce examine
docker run -it --entrypoint /bin/bash examine
root@ab963ace16a1:/opt/setup# ls
data-scripts.d db-pack.sh db-run.sh db-setup.sh
root@ab963ace16a1:/opt/setup# cd /zdata/
root@ab963ace16a1:/zdata# ls
root@ab963ace16a1:/zdata# cd /var/lib/postgresql/
root@ab963ace16a1:/var/lib/postgresql# ls
data
root@ab963ace16a1:/var/lib/postgresql# cd data/
root@ab963ace16a1:/var/lib/postgresql/data# ls
root@ab963ace16a1:/var/lib/postgresql/data# ls -lah
total 8.0K
drwxrwxrwx 2 postgres postgres 4.0K Jul 17 23:55 .
drwxr-xr-x 1 postgres postgres 4.0K Jul 17 23:55 ..
root@ab963ace16a1:/var/lib/postgresql/data#
root@ab963ace16a1:/var/lib/postgresql/data# ls^C
root@ab963ace16a1:/var/lib/postgresql/data# exit
exit