5

At work there is a Docker host with a pretty small /var/lib/docker which fills up pretty fast whenever a few of the docker build commands fail in a row. In particular because not all of the docker build commands use the following flags: --no-cache --force-rm --rm=true, the point of which (in my understanding) is to try to delete extra junk after successful or unsuccessful builds. You can find these flags if you visit the url https://docs.docker.com/engine/reference/commandline/build/ and the scroll down.

One issue we are having is that not everybody does docker build with the flags --no-cache --force-rm --rm=true and it is kind of hard to track down (silly, I know) but then also there may be some other causes for filling up /var/lib/docker that we have not caught. IT would not give us the permission to look inside that directory for better understanding, but we are able to run docker image prune or docker system prune and that seems to be a good solution to our problems, except for the fact that we run it manually for now, whenever things go bad.

We are thinking of getting ahead of the problem by a) running yes | docker image prune just about every time after an image is built. I wrote "just about" because it is hard to track down every repo that builds an image (successfully or not) but that is a separate story. Even if this command has some side-effect (such as breaking somebody else's simultaneous docker build on the same Docker host, it would only run once in a while, thus the probability of a clash being low. The other approach being discussed is pretty much blindly adding yes | docker image prune to a cron job that runs say every 2 hours. If this command has potential negative side effects, then the damage would be more likely.

Why do I even think that another docker build might break? Well, I do not know it for a fact, or else I would not be asking this question. In an attempt to better understand the so called images that we sometimes end up with after a broken docker build, I read this often-cited article: https://projectatomic.io/blog/2015/07/what-are-docker-none-none-images/

My understanding is that a docker build that has not finished yet, ends up leaving some images on disk, which it could then clean up at the end, depending on the flags. However, if something (such as the command yes | docker image prune that is issued in parallel) deletes some of this intermediate image layers, then the overall build would also fail.

Is this true? If so, then what is a good way to keep /var/lib/docker clean when building many images.

P.S. I am not a frequent user of S.O. so please suggest ways of improving this question if it violates some rules.

Leonid
  • 622
  • 3
  • 9
  • 22
  • Whoever voted to close the question, please elaborate why in the comment. What can I do to make it better? – Leonid Feb 05 '23 at 19:53

3 Answers3

3

I tried to reproduce the described behavior with the following script. The idea is to start several docker build processes in parallel. During it also run several docker system prune processes in parallel.

Dockerfile:

FROM centos:7
RUN echo "before sleep"
RUN sleep 10
RUN echo "after sleep"
RUN touch /myfile

test.sh:

#!/bin/bash
    
docker build -t test1 --no-cache . &
docker build -t test2 --no-cache . &
docker build -t test3 --no-cache . &
docker build -t test4 --no-cache . &
sleep 5
echo Prune!
docker system prune -f &
docker system prune -f &
sleep 15
docker run --rm test1 ls -la /myfile
docker run --rm test2 ls -la /myfile
docker run --rm test3 ls -la /myfile
docker run --rm test4 ls -la /myfile

Running bash test.sh I get successful builds and prune. There was an exception from second prune process: Error response from daemon: a prune operation is already running which means that prune recognizes this conflict situation.

Tested on docker version 19.03.12, host system centos 7

Slava Kuravsky
  • 2,702
  • 10
  • 16
  • Can you test it with a `docker buildx` instead of `docker build`? – VonC Feb 07 '23 at 23:31
  • With modifications like `DOCKER_BUILDKIT=1 docker build -t test1 --no-cache --progress plain . &` worked also fine – Slava Kuravsky Feb 07 '23 at 23:38
  • Thank you. What is a good way to detect the currently running prune and not start another one, without using a file lock or installing anything? Just assuming a bare minimum server set up, or at least a very common and reliable tool, if I had to install something. – Leonid Feb 08 '23 at 01:03
  • 1
    You can safely start prune. As I mentioned in the post, the handling is correctly done by prune. So you can run `docker system prune -f || true` – Slava Kuravsky Feb 08 '23 at 08:57
2

A docker image prune (without the -a option) will remove only dangling images, not unused images.

As explained in "What is a dangling image and what is an unused image?"

Dangling images are images which do not have a tag, and do not have a child image (e.g. an old image that used a different version of FROM busybox:latest), pointing to them.

They may have had a tag pointing to them before and that tag later changed.
Or they may have never had a tag (e.g. the output of a docker build without including the tag option).

Intermediate image produced by a docker build should not be considered dangling, as they have a child image pointing to them.

As such (to be tested), it should be safe to use yes | docker image prune while images are being built.

Plus, Buildkit is now the default (moby v23.0.0) on Linux, and is made to avoid side effects with rest of the API (intermediate images and containers):

At the core of BuildKit is a Low-Level Build (LLB) definition format. LLB is an intermediate binary format that allows developers to extend BuildKit. > LLB defines a content-addressable dependency graph that can be used to put together very complex build definitions.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thank you for the answer. A follow-up question on "Intermediate image produced by a docker build should not be considered dangling, as they have a child image pointing to them." So, why do we get a dangling image after an unsuccessful `docker build` that does not use all the right flags, but not after a successful build? At least that was my experience with `docker build` when I used it. – Leonid Feb 08 '23 at 01:27
  • @Leonid I mentioned [none images here](https://stackoverflow.com/a/33913711/6309). Removing that unlabelled image should not impact the next build. And using `buildx` (now by default with docker/moby v23) should avoid those none images. – VonC Feb 08 '23 at 09:24
0

Yes it is safe. Due to locking of image layers for build time, for base layers of other running images or for running containers. Made such things many times in parallel with running automated build pipelines, with running Kubernetes cluster, etc...

Sergey Bezugliy
  • 580
  • 7
  • 23
  • Thank you for the answer. Please elaborate on "Due to locking of image layers for build time, for base layers of other running images or for running containers." Where can I read more about this "locking"? – Leonid Feb 09 '23 at 22:39