0

I am in the process of experimenting/tinkering/learning/breaking with Docker. I am currently writing Docker code to create a snapshotted testing environment for my application.

By snapshotted I mean that my database is reset on purpose on every restart, so that I can work with old data at a certain time. What is peculiar in my case is that I want to populate a Postgresql database at build time, not at start time. Postgresql image is ready for populating the db with sql scripts at container start, but it takes hours.

My application is made by a Tomcat 8.5 server running my WAR and a Postgresql database, which is the focus of my question now. I am creating a Gist while I write for full code.

The code I have done

Full code on Gist

I have followed a tutorial on how to build a Docker image of Postgres with a full database, rather than have Postgres populate itself on boot. This because I have a million record database and only a .sql.gz dump that sysop gave me.

So the relevant parts of the Dockerfile are

WORKDIR /opt/setup/
COPY db-setup.sh /opt/setup/
COPY db-pack.sh /opt/setup/
COPY db-run.sh /opt/setup/

RUN ./db-setup.sh
RUN ./db-pack.sh

#VOLUME $PGDATA (Note it is commented out, now)

EXPOSE 5432

The db-setup.sh is run on image build, and picks files from data-scripts.d. Of course I am not allowed to share the contents of the dump, but it's a plain .sql.gz with plenties of OIDs that take a huge amount of time to restore. The db-setup.sh shown in Gist is derived from both the tutorial and the original Postgres image so that it handles correctly the compression (the tutorial only uses plain SQL)

Build succeeds, startup fails

When I build the image, it takes considerable amount of time to load the data, which is what I want

2019-08-07 07:57:04.149 UTC [49] LOG:  database system was shut down at 2019-08-07 07:57:03 UTC
2019-08-07 07:57:04.231 UTC [48] LOG:  database system is ready to accept connections
 done
server started

./db-setup.sh: running methodinv_pcp3.sql.gz
2019-08-07 08:49:52.052 UTC [117] ERROR:  canceling autovacuum task
2019-08-07 08:49:52.052 UTC [117] CONTEXT:  automatic analyze of table "postgres.public.ftt_interactive_data_492"
2019-08-07 08:49:59.086 UTC [118] ERROR:  canceling autovacuum task
2019-08-07 08:49:59.086 UTC [118] CONTEXT:  automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:50:34.086 UTC [118] ERROR:  canceling autovacuum task
2019-08-07 08:50:34.086 UTC [118] CONTEXT:  automatic analyze of table "postgres.public.ftt_validation_492"
2019-08-07 08:51:11.889 UTC [119] ERROR:  canceling autovacuum task
2019-08-07 08:51:11.889 UTC [119] CONTEXT:  automatic analyze of table "postgres.public.ftt_oper_492"
2019-08-07 08:54:21.131 UTC [123] ERROR:  canceling autovacuum task
2019-08-07 08:54:21.131 UTC [123] CONTEXT:  automatic analyze of table "postgres.public.ftt_oper_492"


waiting for server to shut down...2019-08-07 08:54:28.652 UTC [48] LOG:  received fast shutdown request
.2019-08-07 08:54:28.797 UTC [48] LOG:  aborting any active transactions
2019-08-07 08:54:28.799 UTC [48] LOG:  worker process: logical replication launcher (PID 55) exited with exit code 1
2019-08-07 08:54:28.800 UTC [50] LOG:  shutting down
..2019-08-07 08:54:31.407 UTC [48] LOG:  database system is shut down
 done

When I run the image with docker run, startup fails because it can't find Postgres configuration

D:\IdeaProjects\pcp\ftt-containers\ftt-db-method>docker run -p 5432:5432 -l ftt-db-method ftt-db-method:latest
Restoring /var/lib/postgresql/data ...
Done.
Launching command: postgres ...
postgres: could not access the server configuration file "/var/lib/postgresql/data/postgresql.conf": No such file or directory

Originally, my Dockerfile exposed a VOLUME which is now commented out. The above output occurs both when I declare a volume (which is not exactly what I want, I am new to Docker and copied&pasted on first chance) and when I comment the volume out.

Question

What is wrong with the Docker image of Postgres fully loaded with s**tloads of data I am experimenting? How can I effectively start Postgres with an already full database that will not (necessarily) survive container restarts?


Edit 1

By bash-ing into the container I have found that the data dump created during build time is 10K, so basically empty.

This doesn't solve my problem yet, but answers why Postgres is unable to find its beloved data dir


Edit 2

I was able to bash into a temporary container, in particular between the moment the database is restored and the data lib is packed.

Basically the Dockerfile does

RUN ./db-setup.sh

Which executes the restore of the sql

echo "$0: running $f"; gunzip -c "$f" | "${psql[@]}" > /dev/null 2>&1 ; echo ;;

The output is saved to a temporary container. Now Dockerfile does

RUN ./db-pack.sh

Which tars /var/lib/postgresql/data into /zdata. I have

2019-08-07 16:43:51.532 UTC [42] LOG:  received fast shutdown request
waiting for server to shut down....2019-08-07 16:43:51.676 UTC [42] LOG:  aborting any active transactions
2019-08-07 16:43:51.679 UTC [42] LOG:  worker process: logical replication launcher (PID 49) exited with exit code 1
2019-08-07 16:43:51.681 UTC [44] LOG:  shutting down
...2019-08-07 16:43:54.952 UTC [42] LOG:  database system is shut down
 done
server stopped
Removing intermediate container 8dbe2a4e776a
 ---> 263896b905ce
Step 15/19 : RUN ./db-pack.sh
 ---> Running in 56132ecb90cc
Packing data folder:  /var/lib/postgresql/data
Pack & clean finished successfully.
Removing intermediate container 56132ecb90cc
 ---> 1a7f8d68e8df
Step 16/19 : VOLUME $PGDATA
 ---> Running in 10d222beed81
Removing intermediate container 10d222beed81
 ---> e1a9355882d1

So I tagged 263896b905ce (YHMV if you replicate on your pc) into a new image, then executed bash on it. The data dir was empty, the script would have packed nothing

docker tag 263896b905ce examine
docker run -it --entrypoint /bin/bash examine


root@ab963ace16a1:/opt/setup# ls
data-scripts.d  db-pack.sh  db-run.sh  db-setup.sh
root@ab963ace16a1:/opt/setup# cd /zdata/
root@ab963ace16a1:/zdata# ls
root@ab963ace16a1:/zdata# cd /var/lib/postgresql/
root@ab963ace16a1:/var/lib/postgresql# ls
data
root@ab963ace16a1:/var/lib/postgresql# cd data/
root@ab963ace16a1:/var/lib/postgresql/data# ls
root@ab963ace16a1:/var/lib/postgresql/data# ls -lah
total 8.0K
drwxrwxrwx 2 postgres postgres 4.0K Jul 17 23:55 .
drwxr-xr-x 1 postgres postgres 4.0K Jul 17 23:55 ..
root@ab963ace16a1:/var/lib/postgresql/data#
root@ab963ace16a1:/var/lib/postgresql/data# ls^C
root@ab963ace16a1:/var/lib/postgresql/data# exit
exit
Community
  • 1
  • 1
usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305

1 Answers1

0

Fixed

According to https://stackoverflow.com/a/52762779/471213

"why doesn't VOLUME work?" When you define a VOLUME in the Dockerfile, you can only define the target, not the source of the volume. During the build, you will only get an anonymous volume from this. That anonymous volume will be mounted at every RUN command, prepopulated with the contents of the image, and then discarded at the end of the RUN command. Only changes to the container are saved, not changes to the volume.

So I had basically to run both RUNs at the same time

RUN ./db-setup.sh && ./db-pack.sh
#RUN ./db-pack.sh
usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305