0

For example, I may have the following Dockfile. When I run docker build, for each RUN, there is a spearate hash (e.g., 1d9c17228a9e), and it runs very fast it had run already. I guess each hash is associated an actual file at the backend. Is it so?

If there are separate files, how they can be loaded in a single virtual machine quickly? Is there any kind of assemble upon starting a new virtual machine (docker container)? Thanks.

$ docker build -t ubtsrv .
Sending build context to Docker daemon  12.29kB
Step 1/22 : FROM ubuntu
 ---> 1d9c17228a9e
Step 2/22 : RUN rm -rf /etc/dpkg/dpkg.cfg.d/excludes
 ---> Using cache
 ---> eb02f606ba08
Step 3/22 : RUN apt-get -y update &&     dpkg -l | grep ^ii | cut -d' ' -f3 | xargs apt-get install -y --reinstall
 ---> Using cache
 ---> 7062816b0023
Step 4/22 : RUN apt-get -y install apt-utils
 ---> Using cache
 ---> b89d4cdb791c
Step 5/22 : RUN apt -y update && apt -y upgrade
 ---> Using cache
 ---> 8100af2b7f2e
Step 6/22 : RUN apt-get -y install vim
 ---> Using cache
 ---> 57c142f99935
Step 7/22 : RUN apt-get -y install man
 ---> Using cache
 ---> ddb73e4bbddc
Step 8/22 : RUN apt-get -y install gawk
 ---> Using cache
 ---> 7422b4371c16
Step 9/22 : RUN apt-get -y install mawk
 ---> Using cache
 ---> 53a01709a342
Step 10/22 : RUN apt-get -y install build-essential
 ---> Using cache
 ---> af94947e6922
Step 11/22 : RUN apt-get -y install command-not-found
 ---> Using cache
 ---> 20094698a583
Step 12/22 : RUN apt-get -y install clang
 ---> Using cache
 ---> e63570058a57
Step 13/22 : RUN apt-get -y install htop
 ---> Using cache
 ---> b09fec30dc23
Step 14/22 : RUN apt-get -y install wget
 ---> Using cache
 ---> d2794d29f9ee
Step 15/22 : RUN apt-get -y install curl
 ---> Using cache
 ---> 2b122c49f3ca
Step 16/22 : RUN wget -q ftp://ftp.gnu.org/gnu/bash/bash-4.4.18.tar.gz &&   tar xzvf bash-4.4.18.tar.gz &&   cd bash-4.4.18 &&   ./configure &&   make -j &&   make install &&   cd .. &&   rm -rf bash-4.4.18.tar.gz bash-4.4.18
 ---> Using cache
 ---> c4bf046aff2a
Step 17/22 : RUN apt-get install -y git
 ---> Using cache
 ---> 40ebefa7acda
Step 18/22 : RUN apt-get install -y ack
 ---> Using cache
 ---> 05cefb3f0496
Step 19/22 : RUN apt-get install -y info
 ---> Using cache
 ---> 3361e4e4e06f
Step 20/22 : RUN apt-get install -y llvm
 ---> Using cache
 ---> 50b7c75fc2f5
Step 21/22 : RUN apt-get install -y graphviz
 ---> Using cache
 ---> 80f89477930c
Step 22/22 : RUN apt-get install -y cmake
 ---> Using cache
 ---> c8320b1b2523
Successfully built c8320b1b2523
Successfully tagged ubtsrv:latest
$ cat Dockerfile 
FROM ubuntu
RUN rm -rf /etc/dpkg/dpkg.cfg.d/excludes

RUN apt-get -y update && \
    dpkg -l | grep ^ii | cut -d' ' -f3 | xargs apt-get install -y --reinstall
RUN apt-get -y install apt-utils
RUN apt -y update && apt -y upgrade
RUN apt-get -y install vim
RUN apt-get -y install man
RUN apt-get -y install gawk
RUN apt-get -y install mawk
RUN apt-get -y install build-essential
RUN apt-get -y install command-not-found
RUN apt-get -y install clang
RUN apt-get -y install htop
RUN apt-get -y install wget
RUN apt-get -y install curl
RUN wget -q ftp://ftp.gnu.org/gnu/bash/bash-4.4.18.tar.gz && \
  tar xzvf bash-4.4.18.tar.gz && \
  cd bash-4.4.18 && \
  ./configure && \
  make -j && \
  make install && \
  cd .. && \
  rm -rf bash-4.4.18.tar.gz bash-4.4.18
RUN apt-get install -y git
RUN apt-get install -y ack
RUN apt-get install -y info
RUN apt-get install -y llvm
RUN apt-get install -y graphviz
RUN apt-get install -y cmake
user1424739
  • 11,937
  • 17
  • 63
  • 152
  • Virtual machine is the wrong word: https://stackoverflow.com/questions/16047306/how-is-docker-different-from-a-virtual-machine?rq=1 – BMitch Jan 14 '19 at 16:46

1 Answers1

1

Each hash is a docker layer. It's just a filesystem layer containing the different files added in that step. If you dip into docker internals you can actually take a look at the specific files that were added.

This section on docker caching describes how docker decides what is cached and what is not: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache

This tool is a lot of fun: https://github.com/wagoodman/dive an easy way to explore your docker images and check out the contents of each layer.

Let's talk through an example dockerfile:

FROM alpine

WORKDIR /opt/
RUN touch foo && mkdir bar && touch bar/foo

RUN rm foo && touch file.txt

RUN rm -rf bar

Here's the build output:

Building app
Step 1/5 : FROM alpine
 ---> 196d12cf6ab1
Step 2/5 : WORKDIR /opt/
 ---> Running in 2098e27c28b9
Removing intermediate container 2098e27c28b9
 ---> 74634b6a7dcd
Step 3/5 : RUN touch foo && mkdir bar && touch bar/foo
 ---> Running in f109a620ebfd
Removing intermediate container f109a620ebfd
 ---> dea70d465cc1
Step 4/5 : RUN rm foo && touch file.txt
 ---> Running in 367e61e301ba
Removing intermediate container 367e61e301ba
 ---> 9dcca4810268
Step 5/5 : RUN rm -rf bar
 ---> Running in d176de336110
Removing intermediate container d176de336110
 ---> 2e2eee6b9bf8

Successfully built 2e2eee6b9bf8
Successfully tagged docker-fsl_app:latest

If I run docker inspect 2e2eee6b9bf8 (the outputed hash above) docker returns a bunch of data. Included in that are two sections:

"GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/de87e6b38f95b44137409b5a61b498781473bc05cfd74a01dd641245219c2a1f/diff:/var/lib/docker/overlay2/02d58096fd47908c82edbc34dd0205541e525afe804e88f517ff47ccf3beeee0/diff:/var/lib/docker/overlay2/91fb3592a0da4847071a51e7dda4f48b810a5d1ff0b22e34bb38a0ee52d13d09/diff:/var/lib/docker/overlay2/2e966b19c5984548a6adb172d092dd21b2bb73f6be839baa680dc524d5221063/diff",
                "MergedDir": "/var/lib/docker/overlay2/3216972ae99360398a74720226b26b61f0c04142ad6aaa519c1a9dd36f7fb945/merged",
                "UpperDir": "/var/lib/docker/overlay2/3216972ae99360398a74720226b26b61f0c04142ad6aaa519c1a9dd36f7fb945/diff",
                "WorkDir": "/var/lib/docker/overlay2/3216972ae99360398a74720226b26b61f0c04142ad6aaa519c1a9dd36f7fb945/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:df64d3292fd6194b7865d7326af5255db6d81e9df29f48adde61a918fbd8c332",
                "sha256:b9f91d14f5d797f43eeb5b56264cc697641d50dd5e9d17bf89f33cf0694f6559",
                "sha256:97195b4b7c22c7eb8720edeb93feeb6901a34018ce1f3c90dc17f861438abf21",
                "sha256:0f3d56ac5865b537686b1e324dfbf54edde5afd06e644903ad6b9af42eab01df",
                "sha256:5ff5ef92db130446e0af4836ffba8fbf29d06643aa05a104cb4c7a4c9e462fc7"
            ]
        },

I'm on osx. On osx I can run screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty to access the docker vm. Within the vm, I can actually look at the filesystem layers.

These are the layers in reverse order: /var/lib/docker/overlay2/de87e6b38f95b44137409b5a61b498781473bc05cfd74a01dd641245219c2a1f/diff:/var/lib/docker/overlay2/02d58096fd47908c82edbc34dd0205541e525afe804e88f517ff47ccf3beeee0/diff:/var/lib/docker/overlay2/91fb3592a0da4847071a51e7dda4f48b810a5d1ff0b22e34bb38a0ee52d13d09/diff:/var/lib/docker/overlay2/2e966b19c5984548a6adb172d092dd21b2bb73f6be839baa680dc524d5221063/diff

If you go to those locations you'll see just the files added or removed in that layer. So if I go to /var/lib/docker/overlay2/de87e6b38f95b44137409b5a61b498781473bc05cfd74a01dd641245219c2a1f/diff/opt within the vm and run ls -lah. This is the output:

drwxr-xr-x    2 root     root        4.0K Jan 14 16:15 .
drwxr-xr-x    3 root     root        4.0K Jan 14 16:15 ..
-rw-r--r--    1 root     root           0 Jan 14 16:15 file.txt
c---------    1 root     root        0,   0 Jan 14 16:15 foo

file.txt has been added and foo has been deleted (I think that's why foo doesn't have permissions, the specific details of what a "deleted" file is is unclear to me).

So for every build layer the diff of files added or deleted is added as a lyaer.

maxm
  • 3,412
  • 1
  • 19
  • 27
  • What does 'filesystem layer' means? Does each layer actually map to a file on the filesystem? – user1424739 Jan 14 '19 at 15:52
  • Added to the answer, the layer is all the files that have been added/removed, not just a single file. – maxm Jan 14 '19 at 16:35
  • I see `"MergedDir": "/var/lib/docker/overlay2/83c4e511b57d3a550679c26eead8b701b7e1f4722af204d6c83aa8e8af96fc76/merged"`. But there is only one merge file `./9e8fa55b2d2745e289e4f409b38bdf54e407eb9a4314342d3af62a7ed4fe5551/merged` in my docker's `/var/lib/docker/overlay2`. Why is it so? What the `merged` file is for? – user1424739 Jan 14 '19 at 17:18
  • Not sure, there is likely more information in the docs: https://docs.docker.com/storage/storagedriver/overlayfs-driver/ – maxm Jan 14 '19 at 17:20