5

I've searched site:stackoverflow.com dockerfile: ENV, RUN - layers or images and have read Does Docker EXPOSE make a new layer? and What are Docker image "layers"?.

While reading docs Best practices for writing Dockerfiles and trying to understand this part:

Each ENV line creates a new intermediate layer, just like RUN commands. This means that even if you unset the environment variable in a future layer, it still persists in this layer and its value can be dumped.

I recalled that part above:

In older versions of Docker, it was important that you minimized the number of layers in your images to ensure they were performant. The following features were added to reduce this limitation:

Only the instructions RUN, COPY, ADD create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.

I've read How to unset "ENV" in dockerfile?. and redid the example given on doc page, it indeed proves ENV is not unset:

$ docker build -t alpine:envtest -<<HITHERE
> FROM alpine
> ENV ADMIN_USER="mark"
> RUN unset ADMIN_USER
> HITHERE
Sending build context to Docker daemon  3.072kB
Step 1/3 : FROM alpine
latest: Pulling from library/alpine
89d9c30c1d48: Already exists 
Digest: sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a
Status: Downloaded newer image for alpine:latest
 ---> 965ea09ff2eb
Step 2/3 : ENV ADMIN_USER="mark"
 ---> Running in 5d34f829a387
Removing intermediate container 5d34f829a387
 ---> e9c50b16c0e1
Step 3/3 : RUN unset ADMIN_USER
 ---> Running in dbcf57ca390d
Removing intermediate container dbcf57ca390d
 ---> 2cb4de2e0257
Successfully built 2cb4de2e0257
Successfully tagged alpine:envtest
$ docker run --rm alpine:envtest sh -c 'echo $ADMIN_USER'
mark

And the output says same "Removing intermediate container" for both ENV and RUN. I've recently downloaded docker, don't think it is that old:

$ docker --version
Docker version 19.03.5, build 633a0ea

Maybe RUN instruction and RUN command is a different thing?
ENV, RUN - do they create layers, images or containers?

Alex Martian
  • 3,423
  • 7
  • 36
  • 71
  • 1
    `RUN unset ...` only has an effect until the end of the current line; try `RUN unset ADMIN_USER && echo $ADMIN_USER`. `ENV` is the only thing that will persistently change an environment variable setting. – David Maze Dec 10 '19 at 10:56
  • @David Maze, thank you, looks it will work. I also want to see if I understand docs. – Alex Martian Dec 10 '19 at 11:15

1 Answers1

5

The Docker as containerization system based on two main concepts of image and container. The major difference between them is the top writable layer. When you generate a new container the new writable layer will be put above the last image layer. This layer is often called the container layer.

All the underlying image content remains unchanged and each change in the running container that creating new files, modifying existing files and so on will be copied in this thin writable layer.

In this case, Docker only stores the actual containers data and one image instance, which decreases storage usage and simplifies the underlying workflow. I would compare it with static and dynamic linking in C language, so Docker uses dynamic linking.

The image is a combination of layers. Each layer is only a set of differences from the layer before it.


The documentation says:

Only the instructions RUN, COPY, ADD create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.

The description here is neither really clear nor accurate, and generally speaking these aren't the only instructions that create layers in the latest versions of Docker, as the documentation outlines.

For example, by default WORKDIR creates a given path if it does not exist and change directory to it. If the new path was created WORKDIR will generate a new layer.

By the way, ENV doesn't lead to layer creation. The data will be stored permanently in image and container config and there is no easy way to get rid of it. Basically, there are two options, how to organize workflow:

  • Temporal environment variables, they will be available until the end of the current RUN directive:
RUN export NAME='megatron' && echo $NAME # 'megatron'
RUN echo $NAME # blank
  • Clean environment variable, if there is no difference for you between the absence of env or blank content of it, then you could do:
ENV NAME='megatron'
# some instructions
ENV NAME=''
RUN echo $NAME

In the context of Docker, there is no distinction between commands and instructions. For RUN any commands that don't change filesystem content won't trigger permanent layers creation. Consider the following Dockerfile:

FROM alpine:latest
RUN echo "Hello World" # no layer
RUN touch file.txt # new layer
WORKDIR /no/existing/path # new layer

In the end, the output would be:

Step 1/4 : FROM alpine:latest
 ---> 965ea09ff2eb
Step 2/4 : RUN echo "Hello World"
 ---> Running in 451adb70f017
Hello World
Removing intermediate container 451adb70f017
 ---> 816ccbd1e8aa
Step 3/4 : RUN touch file.txt
 ---> Running in 9edc6afdd1e5
Removing intermediate container 9edc6afdd1e5
 ---> ea0040ec0312
Step 4/4 : WORKDIR /no/existing/path
 ---> Running in ec0feaf6710d
Removing intermediate container ec0feaf6710d
 ---> f2fe46478f7c
Successfully built f2fe46478f7c
Successfully tagged envtest:lastest

There is inspect command for inspecting Docker objects:

docker inspect --format='{{json .RootFS.Layers}}' <image_id>

Which shows us the list of SHA of three layers getting FROM, second RUN and WORKDIR directives, I would recommend using dive for exploring each layer in a docker image.

So why does it say removing intermediate container and not removing intermediate layer? Actually to execute RUN commands Docker needs to instantiate a container with the intermediate image up to that line of the Dockerfile and run the actual command. It will then "commit" the state of the container as a new intermediate image and continue the building process.

Rafa Viotti
  • 9,998
  • 4
  • 42
  • 62
funnydman
  • 9,083
  • 4
  • 40
  • 55