3

I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.

Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??

Following the "data container" pattern, I can easily:

  • Build a base image with all the static program and config files.
  • From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
  • Push both images to Docker Hub.
  • Pull them down to AWS.
  • Run the data and base images, using "--volume-from" to refer to the data.

Using "docker volume create":

  • I'm unclear how to copy my DB into the volume.
  • I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.

Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?

Community
  • 1
  • 1
James Haskell
  • 1,545
  • 2
  • 16
  • 21

1 Answers1

3

When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.

Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.

The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).

BMitch
  • 231,797
  • 42
  • 475
  • 450
  • Thanks a lot for the explanation! This is great stuff! Kudos to the Docker team. I've been trying to find more documentation on this, and not having much luck. https://docs.docker.com/engine/userguide/containers/dockervolumes/ only vaguely discusses what you've described, and doesn't touch on the magic of the data being retained when pushing/pulling the image as you've described. Do you know of any more in depth docs on this? – James Haskell Jun 10 '16 at 22:50
  • This explains it a bit: https://madcoda.com/2016/03/docker-named-volume-explained/. – James Haskell Jun 10 '16 at 23:00
  • For the behavior of an empty named volume automatically getting a copy of the image volume data when you run it, that I've found by testing, haven't seen any other documentation on that detail, but it mirrors exactly what happens with container based volumes for an easy transition. Data inside of your image is part of how you design your Dockerfile, and I'd keep that to a bare minimum. For everything else, I'd make a utility container for import/export/backup. – BMitch Jun 11 '16 at 00:41