0

I have noticed that many Dockerfiles try to minimize the number of instructions by several UNIX commands in a single RUN instruction. So is there any reason?

Also is there any difference in the outcomes between the two Dockerfiles below?

Dockerfile1

FROM ubuntu 
MAINTAINER demousr@example.com 

RUN apt-get update 
RUN apt-get install –y nginx 
CMD ["echo", "Image created"] 

Dockerfile2

FROM ubuntu 
MAINTAINER demousr@example.com 

RUN apt-get update && apt-get install –y nginx 
CMD ["echo", "Image created"] 
Qumber
  • 13,130
  • 4
  • 18
  • 33
Pycoder
  • 67
  • 1
  • 1
  • 6

2 Answers2

4

Roughly speaking, a Docker image contains some metadata & an array of layers, and a running container is built upon these layers by adding a container layer (read-and-write), the layers from the underlying image being read-only at that point.

These layers can be stored in the disk in different ways depending on the configured driver. For example, the following image taken from the official Docker documentation illustrates the way the files changed in these different layers are taken into account with the OverlayFS storage driver: OverlayFS

Next, the Dockerfile instructions RUN, COPY, and ADD create layers, and the best practices mentioned on the Docker website specifically recommend to merge consecutive RUN commands in a single RUN command, to reduce the number of layers, and thereby reduce the size of the final image:

https://docs.docker.com/develop/dev-best-practices/

[…] try to reduce the number of layers in your image by minimizing the number of separate RUN commands in your Dockerfile. You can do this by consolidating multiple commands into a single RUN line and using your shell’s mechanisms to combine them together. […]

See also: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

Moreover, in your example:

RUN apt-get update -y -q
RUN apt-get install -y nginx

if you do docker build -t your-image-name . on this Dockerfile, then edit the Dockerfile after a while, add another package beyond nginx, then do again docker build -t your-image-name ., due to the Docker cache mechanism, the apt-get update -y -q won't be executed again, so the APT cache will be obsolete. So this is another upside for merging the two RUN commands.

ErikMD
  • 13,377
  • 3
  • 35
  • 71
  • In your last paragraph do you mean that combining multiple RUN commands ensures cache is not used? Combining RUN commands invalidates cache - can you give me a reference documentation about this please – variable May 14 '20 at 11:43
  • Yes: I meant that given the rules of Docker's cache, a separated layer `RUN apt-get update -y -q` would always be reused from the Docker cache (keeping therefore an old version of the package-manager cache). So, combining multiple `RUN` commands ensures a change of the packages list in the `apt-get install` command will execute this command *after re-running* the `apt-get update` command. – ErikMD May 14 '20 at 11:51
  • Ok but had the package list not changed, then the layer would be re-used. Right? I thought RUN only looks at command string and not inside file. – variable May 14 '20 at 11:57
  • Yes, the layer would be re-used in this case. The best reference I've found is https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache : "Aside from the `ADD` and `COPY` commands, cache checking does not look at the files in the container to determine a cache match. […] In that case just the command string itself is used to find a match." – ErikMD May 14 '20 at 11:59
  • It says "just the command string itself is used to find a match." - So how will the Dockerfile2 provided by the OP use the cache on the 2nd run? The command is the same so the cache will be used isnt it? – variable May 14 '20 at 12:05
  • Basically on the 2nd run, if the Dockerfile is unchanged, all layers are reused from the cache! but if the OP changes `RUN apt-get update && apt-get install -y nginx` into another string, e.g. `RUN apt-get update && apt-get install -y nginx curl`, the whole `RUN` layer is rebuilt… – ErikMD May 14 '20 at 12:08
  • Thanks, what if a `file-containing-list-of-package.txt` is used. For example: `RUN apt-get update && apt-get install file-containing-list-of-package.txt` - is this case, if the file contents are changed, then does docker ignore the changes and always re-use the layer from the 1st build? – variable May 14 '20 at 12:15
  • 1
    It depends on how you added your `file-containing-list-of-package.txt`: if you added this file from the build context thanks to a previous `COPY` line, any file change will trigger a rebuild of the `COPY` line, as well as of all subsequent lines (including `RUN`); if you created this file from a command such as `curl -o containing-list-of-package.txt https://…`, then the possible remote changes of the file will be ignored. – ErikMD May 14 '20 at 12:20
2

In addition to the space savings, it's also about correctness

Consider your first dockerfile (a common mistake when working with debian-like systems which utilize apt):

FROM ubuntu 
MAINTAINER demousr@example.com 

RUN apt-get update 
RUN apt-get install –y nginx 
CMD ["echo", "Image created"] 

If two or more images follow this pattern, a cache hit could cause the image to be unbuildable due to cached metadata

  • let's say I built an image which looks similar to that ~a few weeks ago
  • now I'm building this image today. there's a cache present up until the RUN apt-get update line
  • the docker build will reuse that cached layer (since the dockerfile and base image are identical) up to the RUN apt-get update
  • when the RUN apt-get install line runs, it will use the cached apt metadata (which is now weeks out of date and likely will error)
anthony sottile
  • 61,815
  • 15
  • 148
  • 207