1

I'm confused about this following passage in the article about Docker's CMD vs. RUN vs. ENTRYPOINT.

Note that apt-get update and apt-get install are executed in a single RUN instruction. This is done to make sure that the latest packages will be installed. If apt-get install were in a separate RUN instruction, then it would reuse a layer added by apt-get update, which could had been created a long time ago.

The code given for the passage is:

RUN apt-get update && apt-get install -y \  
  bzr \
  cvs \
  git \
  mercurial \
  subversion

I really don't understand their explanation for putting apt-get install on the same line. Wouldn't apt-get update complete, and then apt-get install... would proceed if they were on separate lines? The article makes it sound like apt-get install... wouldn't see any of the effects that apt-get update made if on separate lines.

qarthandso
  • 2,100
  • 2
  • 24
  • 40

1 Answers1

4

The reference is to caching of the layers. Anytime you run the same command against the same previous layer, Docker will attempt to reuse the cached layer for that command.

So if you add another package to your list a few months from now and rerun the docker build, if you made two separate RUN commands, the apt-get update layer would be reused from the cache and you'd have a 3 month old cache in your image. The attempt to install the packages in the new apt-get install command on the second RUN would fail from any old packages that are no longer in the package repository.

By making it a single RUN command, it's a single layer in the filesystem cache, so it reruns the update on your rebuild months from now and you do the install on packages that are currently in the package repository.


Edit: Seems this still isn't clear, here's a sample scenario of how it goes wrong:

Using the following Dockerfile:

FROM debian:latest
RUN apt-get update
RUN apt-get install -y \  
  bzr \
  cvs \
  git \
  mercurial \
  subversion

When I run docker built -t my-app:latest . it outputs a long list that ends with:

Processing triggers for libc-bin (2.19-18+deb8u4) ...
Processing triggers for systemd (215-17+deb8u4) ...
Processing triggers for ca-certificates (20141019+deb8u1) ...
Updating certificates in /etc/ssl/certs... 174 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d....done.
Processing triggers for sgml-base (1.26+nmu4) ...
 ---> 922e466ac74b
Removing intermediate container 227318b98393
Successfully built 922e466ac74b

Now, if I change this file to add unzip to the package list, and assume it's months later so the apt-get update now contains stale data:

FROM debian:latest
RUN apt-get update
RUN apt-get install -y \  
  bzr \
  cvs \
  git \
  mercurial \
  subversion \
  unzip

If I run that right now, it will work:

Step 1 : FROM debian:latest
 ---> 1b088884749b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 81ca47119e38
Step 3 : RUN apt-get install -y   bzr   cvs   git   mercurial   subversion   unzip
 ---> Running in 87cb8380ec90
Reading package lists...
Building dependency tree...
The following extra packages will be installed:
  ca-certificates dbus file fontconfig fontconfig-config fonts-dejavu-core
  gir1.2-glib-2.0 git-man gnupg-agent gnupg2 hicolor-icon-theme
....
Processing triggers for libc-bin (2.19-18+deb8u4) ...
Processing triggers for systemd (215-17+deb8u4) ...
Processing triggers for ca-certificates (20141019+deb8u1) ...
Updating certificates in /etc/ssl/certs... 174 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d....done.
Processing triggers for sgml-base (1.26+nmu4) ...
 ---> d6d1135481d3
Removing intermediate container 87cb8380ec90
Successfully built d6d1135481d3

But if you look at the above output, the apt-get update shows:

 ---> Using cache

Which means it didn't run the update, it just reused an old layer that ran that step before. When that's only 5 minutes old, it's no issue. But when it's months old, you'll see errors.

The fix, as Docker mentions, is to run the update and install as the same run step, so that when the install cache is invalidated, the update also reruns.

BMitch
  • 231,797
  • 42
  • 475
  • 450
  • thanks for always helping me with my Docker questions. I definitely am closer to understanding what you're saying, but I don't get _The attempt to install the packages in the new `apt-get install` command on the second RUN would fail from any old packages that are no longer in the package repository._ If you wouldn't mind clarifying this passage, I'm not sure I understand the context or what you mean by it. – qarthandso Sep 12 '16 at 03:12
  • You're looking at this as building one image one time, think longer term. When you try to rebuild it months later, after you have changed your Dockerfile on the second run command with the first run command unchanged, the processing will reuse the cache for the first run command and then kick off the second one. With apt-get, if the update is stale, you'll get errors. – BMitch Sep 12 '16 at 12:39
  • It might help his understanding if you make up a fictional example using a particular artifact. – David M. Karr Sep 12 '16 at 15:33
  • Here's an example of what goes wrong when you rely on the `apt-get update` from a separate run command (in another image no less): http://stackoverflow.com/q/39518377/596285 – BMitch Sep 15 '16 at 20:46