2

What is the difference between multiples RUN entries in Dockerfile like:

FROM php:5.6-apache
RUN docker-php-ext-install mysqli 
RUN apt update 
RUN apt install git -y -q

and just one RUN entry?

FROM php:5.6-apache
RUN docker-php-ext-install mysqli && apt update && apt install git -y -q

OBS. I'm not asking which one is better. I Want to know all the differences between the two approaches.

Daniel Santos
  • 14,328
  • 21
  • 91
  • 174
  • 2
    The `RUN` section of the Dockerfile [reference](https://docs.docker.com/engine/reference/builder/#run) has the answer I think. Every `RUN` command creates a new "layer". For some "best practices" about this, check also [this](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run). – tgogos Sep 17 '18 at 12:58
  • 1
    Possible duplicate: [Multiple RUN vs. single chained RUN in Dockerfile, what is better?](https://stackoverflow.com/q/39223249/596285) – BMitch Sep 17 '18 at 12:58

1 Answers1

7

Each RUN command creates a layer of the filesystem changes generated by a temporary container started to run that command. (It's effectively running a docker run and then packaging the result of docker diff into a filesystem layer.)

These layers have a few key details to note:

  • They are immutable. Once you create them you don't change them. You would have to generate/recreate a new layer, to update your image.
  • They are reusable between multiple images and running containers. You can do this because of the immutability.
  • You do not delete files from a parent layer, but you can register that a file is deleted in a later layer. This is a metadata change in that later layer, not a modification to the parent layer.
  • Layers are reused in docker's build cache. If two different images, or even the same image being rebuilt, perform the same command on top of the same parent layer, docker will reuse the already created layer.
  • These layers are merged together into the final filesystem you see inside your container.

The main difference between the two approaches are the build cache and deleting files. If you split apart the download of a source code tgz, extraction of the tgz, compiling a binary, and the deleting of the tgz and source folders, into multiple RUN lines, then when you ship the image over the network and store it on disk, you will have all of the source in the layers even though you don't see it in the final container. Your image will be significantly larger.

Caching can also be a bad thing when you cache too much. If you split the apt update and apt install, and then add a new package to install to your second run line months later, docker will reuse the months old cache of apt update and try to install packages that are months old, possibly no longer available, and your image may fail to build. Many people also run a rm -rf /var/lib/apt/lists/* after installing debian packages. And if you do this in a separate step, you will not actually delete the files from the previous layers, so your image will not shrink.

BMitch
  • 231,797
  • 42
  • 475
  • 450