1

We are about to "dockerize" our not-so-big infrastructure. One crucial question here is the whole backup / restore workflow, which is I think crucial for most enterprise but even private users.

I know about the export and save features of docker which will generate a tarball of a running container, which is neat because it can be done without shutting down the container.

So let's say we are running a container X and we have mounted some volumes:

-v /home/user/dockerapp-X/data:/var/www/html
-v /home/user/dockerapp-X/logs:/var/logs/app-x
-v /home/user/dockerapp-X/config:/etc/app-x

The biggest benefit of this is, if we update app-X we just have to pull the new image and restart the container.

But: This way those directories wouldn't get backupped if we do docker-export or save. So either we can just backup those directories extra, with rsync, backula or whatever. I guess this would be the "standart" way of backupping. But there is no guarantee and also no connection between the current version of the image and the data.

On a VM we would just make a snapshot to have the data and the app connected.

So the question is: Is it a best practice to just make a Dockerfile with the current app-x version and copy the volumes in the image and build/push the whole image to our private repo?

so it would look like this:

FROM repo/app-x
COPY /home/user/dockerapp-X/data:/var/www/html
COPY /home/user/dockerapp-X/logs:/var/logs/app-x
COPY /home/user/dockerapp-X/config:/etc/app-x

then

docker build -t repo/infra/app-x:backup-v1-22.10.2016 .
docker push repo/infra/app-x:backup-v1-22.10.2016

This would mean that in our repo there is a snapshot for the current version of the app and the image contains all current data of the volumes.

So restoring would be:

docker run --name=backup-restored repo/infra/app-x:backup-v1-22.10.2016

And we could even mount the data folders locally on the host again:

docker run --name=backup-restored \
    -v /home/user/dockerapp-X/data:/var/www/html
    -v /home/user/dockerapp-X/logs:/var/logs/app-x
    -v /home/user/dockerapp-X/config:/etc/app-x
    repo/infra/app-x:backup-v1-22.10.2016

Will my data and my app have the correct data and app version?

G-M
  • 296
  • 1
  • 6
  • 15
  • I think you would backup the data volumes with the appropriate tools (such as file level backup from the host, or database dumps if this is relational database data). The container image that matches that data can then be recreated as needed. You could record build revision numbers every time you deploy the app. Maybe you already have release tags in your repository. – Thilo Nov 25 '16 at 11:19
  • Similar discussion: http://stackoverflow.com/questions/26331651/how-can-i-backup-a-docker-container-with-its-data-volumes?rq=1 – Thilo Nov 25 '16 at 11:21
  • Your suggestion is plausible and is the way it works on most systems but the question is if this is a good way to make backups that can be just pulled from the registry at any time on any host (that runs docker) and we are recovered in case of disaster within minutes because it is just a question of pulling the last image, which would contain both app and data. – G-M Nov 25 '16 at 12:27
  • Service restarts within minutes is one thing, but how much data loss can you tolerate? You cannot commit a new image every few minutes. Whereas data-centric solutions give you resilient replica sets, instant file-system snapshots, recovery logs etc. Another issue is size. You cannot have a 500GB container image. The idea is to have stateless containers (that can indeed be respawned in very short time) and extremely durable, highly available data stores. – Thilo Nov 25 '16 at 12:35
  • OK I see your point here. What about let's say a git container, with eg bitbucket or gitlab,which we are backing up on a daily basis? Dataloss is of course always bad but in this use case every dev should have his own repo locally so a loss wouldn't be as bad. – G-M Nov 25 '16 at 12:46
  • It seems very unlikely that your git repos would get lost if you keep them on such a hosted service (and on all dev machines). Perfect example of a data-centric storage solution (where "data" here is your source code, configuration, build scripts, etc). Not sure where the "container" is in this, though. – Thilo Nov 25 '16 at 12:56

0 Answers0