217

My Dockerfile is something like

FROM my/base

ADD . /srv
RUN pip install -r requirements.txt
RUN python setup.py install

ENTRYPOINT ["run_server"]

Every time I build a new image, dependencies have to be reinstalled, which could be very slow in my region.

One way I think of to cache packages that have been installed is to override the my/base image with newer images like this:

docker build -t new_image_1 .
docker tag new_image_1 my/base

So next time I build with this Dockerfile, my/base already has some packages installed.

But this solution has two problems:

  1. It is not always possible to override a base image
  2. The base image grow bigger and bigger as newer images are layered on it

So what better solution could I use to solve this problem?

EDIT:

Some information about the docker on my machine:

☁  test  docker version
Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070
☁  test  docker info
Containers: 0
Images: 56
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 56
Execution Driver: native-0.2
Kernel Version: 3.13.0-29-generic
WARNING: No swap limit support
noamtm
  • 12,435
  • 15
  • 71
  • 107
satoru
  • 31,822
  • 31
  • 91
  • 141
  • Do you delete intermediate image after you finished building your image? – Regan Aug 14 '14 at 10:52
  • Of course not, but this is irrelevant because when I rebuild a image, I'm still basing on the original `my/base` – satoru Aug 15 '14 at 01:38

6 Answers6

203

Try to build a Dockerfile which looks something like this:

FROM my/base

WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
RUN python setup.py install

ENTRYPOINT ["run_server"]

Docker will use cache during pip install as long as you do not make any changes to the requirements.txt, irrespective of the fact whether other code files at . were changed or not. Here's an example.


Here's a simple Hello, World! program:

$ tree
.
├── Dockerfile
├── requirements.txt
└── run.py   

0 directories, 3 file

# Dockerfile

FROM dockerfile/python
WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
CMD python /srv/run.py

# requirements.txt
pytest==2.3.4

# run.py
print("Hello, World")

The output of docker build:

Step 1 : WORKDIR /srv
---> Running in 22d725d22e10
---> 55768a00fd94
Removing intermediate container 22d725d22e10
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> 968a7c3a4483
Removing intermediate container 5f4e01f290fd
Step 3 : RUN pip install -r requirements.txt
---> Running in 08188205e92b
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
  Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest
....
Cleaning up...
---> bf5c154b87c9
Removing intermediate container 08188205e92b
Step 4 : ADD . /srv
---> 3002a3a67e72
Removing intermediate container 83defd1851d0
Step 5 : CMD python /srv/run.py
---> Running in 11e69b887341
---> 5c0e7e3726d6
Removing intermediate container 11e69b887341
Successfully built 5c0e7e3726d6

Let's modify run.py:

# run.py
print("Hello, Python")

Try to build again, below is the output:

Sending build context to Docker daemon  5.12 kB
Sending build context to Docker daemon 
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> Using cache
---> 968a7c3a4483
Step 3 : RUN pip install -r requirements.txt
---> Using cache
---> bf5c154b87c9
Step 4 : ADD . /srv
---> 9cc7508034d6
Removing intermediate container 0d7cf71eb05e
Step 5 : CMD python /srv/run.py
---> Running in f25c21135010
---> 4ffab7bc66c7
Removing intermediate container f25c21135010
Successfully built 4ffab7bc66c7

As you can see above, this time docker uses cache during the build. Now, let's update requirements.txt:

# requirements.txt

pytest==2.3.4
ipython

Below is the output of docker build:

Sending build context to Docker daemon  5.12 kB
Sending build context to Docker daemon 
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> b6c19f0643b5
Removing intermediate container a4d9cb37dff0
Step 3 : RUN pip install -r requirements.txt
---> Running in 4b7a85a64c33
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
  Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest

Downloading/unpacking ipython (from -r requirements.txt (line 2))
Downloading/unpacking py>=1.4.12 (from pytest==2.3.4->-r requirements.txt (line 1))
  Running setup.py (path:/tmp/pip_build_root/py/setup.py) egg_info for package py

Installing collected packages: pytest, ipython, py
  Running setup.py install for pytest

Installing py.test script to /usr/local/bin
Installing py.test-2.7 script to /usr/local/bin
  Running setup.py install for py

Successfully installed pytest ipython py
Cleaning up...
---> 23a1af3df8ed
Removing intermediate container 4b7a85a64c33
Step 4 : ADD . /srv
---> d8ae270eca35
Removing intermediate container 7f003ebc3179
Step 5 : CMD python /srv/run.py
---> Running in 510359cf9e12
---> e42fc9121a77
Removing intermediate container 510359cf9e12
Successfully built e42fc9121a77

Notice how docker didn't use cache during pip install. If it doesn't work, check your docker version.

Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070
Rishabh Agrahari
  • 3,447
  • 2
  • 21
  • 22
nacyot
  • 3,312
  • 1
  • 17
  • 10
  • 6
    This doesn't seem to work, because whenever docker sees an `ADD` instruction, cache is invalidated. – satoru Aug 15 '14 at 02:27
  • Thanks. My docker is older, I'll try again after I upgrade it to 1.1.2. – satoru Aug 15 '14 at 03:53
  • 2
    I'm not sure why it doesn't work. But there isn't any change on requirements.txt(the on `ADD ./requirements.txt /srv/requirements.txt`), then docker must use cache. See [add seciton](https://docs.docker.com/reference/builder/#add) on Dockerfile document. – nacyot Aug 15 '14 at 08:26
  • ADD should be stay cache is the file referenced has not changed. Great answer! – geoff Dec 11 '14 at 03:41
  • 32
    Yes it will use the cache if requirements.txt does not change. But if the requirements.txt changes then all the requirements are downloaded. Is there any way I can mount a pip cache volume into the docker container to load from the cache? – Jitu Dec 30 '15 at 08:29
  • I tried to convert the requirements*.* files from DOS to UNIX and it seems to work. From my Windows host I access my VM (VirtualBox) where Docker runs. I suppose changing file using path UNC (\\myVM\file.txt) via Samab causes some issues. – Daviddd Jun 08 '16 at 09:27
  • 12
    The key to this answer is that you add requirements.txt (`ADD requirements.txt /srv` before you run pip (`RUN pip install -r requirements.txt`), and add all of the other files *after* running pip. Thus, they should be in the following order: (1) `ADD requirements.txt /srv`; (2) `RUN pip install -r requirements.txt`; (3) `ADD . /srv` – engelen Sep 01 '17 at 14:49
  • 14
    Please note that this does not work when using COPY instead of ADD – veuncent Sep 28 '17 at 21:14
  • 1
    this only worked for me using `COPY . .` (or `ADD`) at the end, after installing all the dependencies. – user3417583 Jan 03 '18 at 22:21
  • 1
    This did not work for me using docker version 20.10.5, build 55c4c88. No matter whether I used copy or add, it didn't work. – C.J. Mar 29 '21 at 19:42
  • 2
    @veuncent Wait what? Why does it work only with ADD?! – Hector Ordonez Aug 31 '21 at 14:32
  • Unfortunately the `setup.cfg`, `setup.py` or `pyproject.toml` based dependency management approaches seem to break this model as `pip` has no way to `pip install --dependencies-only` (made-up flag, no such flag exists) for these files like it can with a `requirements.txt`. Which leads to messes like `pip-compile`, then issues keeping things in sync in CI... it's ugly. – Craig Ringer Jul 27 '23 at 02:30
75

I understand this question has some popular answers already. But there is a newer way to cache files for package managers. I think it could be a good answer in the future when BuildKit becomes more standard.

As of Docker 18.09 there is experimental support for BuildKit. BuildKit adds support for some new features in the Dockerfile including experimental support for mounting external volumes into RUN steps. This allows us to create caches for things like $HOME/.cache/pip/.

We'll use the following requirements.txt file as an example:

Click==7.0
Django==2.2.3
django-appconf==1.0.3
django-compressor==2.3
django-debug-toolbar==2.0
django-filter==2.2.0
django-reversion==3.0.4
django-rq==2.1.0
pytz==2019.1
rcssmin==1.0.6
redis==3.3.4
rjsmin==1.1.0
rq==1.1.0
six==1.12.0
sqlparse==0.3.0

A typical example Python Dockerfile might look like:

FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip install -r requirements.txt
COPY . /usr/src/app

With BuildKit enabled using the DOCKER_BUILDKIT environment variable we can build the uncached pip step in about 65 seconds:

$ export DOCKER_BUILDKIT=1
$ docker build -t test .
[+] Building 65.6s (10/10) FINISHED                                                                                                                                             
 => [internal] load .dockerignore                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                            0.0s
 => [internal] load build definition from Dockerfile                                                                                                                       0.0s
 => => transferring dockerfile: 120B                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s
 => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s
 => [internal] load build context                                                                                                                                          0.6s
 => => transferring context: 899.99kB                                                                                                                                      0.6s
 => CACHED [internal] helper image for file operations                                                                                                                     0.0s
 => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.5s
 => [3/4] RUN pip install -r requirements.txt                                                                                                                             61.3s
 => [4/4] COPY . /usr/src/app                                                                                                                                              1.3s
 => exporting to image                                                                                                                                                     1.2s
 => => exporting layers                                                                                                                                                    1.2s
 => => writing image sha256:d66a2720e81530029bf1c2cb98fb3aee0cffc2f4ea2aa2a0760a30fb718d7f83                                                                               0.0s
 => => naming to docker.io/library/test                                                                                                                                    0.0s

Now, let us add the experimental header and modify the RUN step to cache the Python packages:

# syntax=docker/dockerfile:experimental

FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
COPY . /usr/src/app

Go ahead and do another build now. It should take the same amount of time. But this time it is caching the Python packages in our new cache mount:

$ docker build -t pythontest .
[+] Building 60.3s (14/14) FINISHED                                                                                                                                             
 => [internal] load build definition from Dockerfile                                                                                                                       0.0s
 => => transferring dockerfile: 120B                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                            0.0s
 => resolve image config for docker.io/docker/dockerfile:experimental                                                                                                      0.5s
 => CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3                                 0.0s
 => [internal] load .dockerignore                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                       0.0s
 => => transferring dockerfile: 120B                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s
 => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s
 => [internal] load build context                                                                                                                                          0.7s
 => => transferring context: 899.99kB                                                                                                                                      0.6s
 => CACHED [internal] helper image for file operations                                                                                                                     0.0s
 => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.6s
 => [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt                                                                                  53.3s
 => [4/4] COPY . /usr/src/app                                                                                                                                              2.6s
 => exporting to image                                                                                                                                                     1.2s
 => => exporting layers                                                                                                                                                    1.2s
 => => writing image sha256:0b035548712c1c9e1c80d4a86169c5c1f9e94437e124ea09e90aea82f45c2afc                                                                               0.0s
 => => naming to docker.io/library/test                                                                                                                                    0.0s

About 60 seconds. Similar to our first build.

Make a small change to the requirements.txt (such as adding a new line between two packages) to force a cache invalidation and run again:

$ docker build -t pythontest .
[+] Building 15.9s (14/14) FINISHED                                                                                                                                             
 => [internal] load build definition from Dockerfile                                                                                                                       0.0s
 => => transferring dockerfile: 120B                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                            0.0s
 => resolve image config for docker.io/docker/dockerfile:experimental                                                                                                      1.1s
 => CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3                                 0.0s
 => [internal] load build definition from Dockerfile                                                                                                                       0.0s
 => => transferring dockerfile: 120B                                                                                                                                       0.0s
 => [internal] load .dockerignore                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/library/python:3.7                                                                                                              0.5s
 => CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092                                                 0.0s
 => CACHED [internal] helper image for file operations                                                                                                                     0.0s
 => [internal] load build context                                                                                                                                          0.7s
 => => transferring context: 899.99kB                                                                                                                                      0.7s
 => [2/4] COPY requirements.txt /usr/src/app/                                                                                                                              0.6s
 => [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt                                                                                   8.8s
 => [4/4] COPY . /usr/src/app                                                                                                                                              2.1s
 => exporting to image                                                                                                                                                     1.1s
 => => exporting layers                                                                                                                                                    1.1s
 => => writing image sha256:fc84cd45482a70e8de48bfd6489e5421532c2dd02aaa3e1e49a290a3dfb9df7c                                                                               0.0s
 => => naming to docker.io/library/test                                                                                                                                    0.0s

Only about 16 seconds!

We are getting this speedup because we are no longer downloading all the Python packages. They were cached by the package manager (pip in this case) and stored in a cache volume mount. The volume mount is provided to the run step so that pip can reuse our already downloaded packages. This happens outside any Docker layer caching.

The gains should be much better on larger requirements.txt.

Notes:

  • This is experimental Dockerfile syntax and should be treated as such. You may not want to build with this in production at the moment.
  • The BuildKit stuff doesn't work under Docker Compose or other tools that directly use the Docker API at the moment. There is now support for this in Docker Compose as of 1.25.0. See How do you enable BuildKit with docker-compose?
  • There isn't any direct interface for managed the cache at the moment. It is purged when you do a docker system prune -a.

Hopefully, these features will make it into Docker for building and BuildKit will become the default. If / when that happens I will try to update this answer.

Andy Shinn
  • 26,561
  • 8
  • 75
  • 93
  • 2
    I can confirm this solution works very well. My build went down from over a minute to just 2.2 seconds. Thanks @andy-shinn. – Kwuite Nov 09 '19 at 13:33
  • 3
    Now also Docker-Compose: https://stackoverflow.com/questions/58592259/how-do-you-enable-buildkit-with-docker-compose – Rexcirus Dec 03 '19 at 21:30
  • Note: If you are using SUDO to run docker, you probably need to do: sudo DOCKER_BUILDKIT=1 ... – Vinícius M Jun 12 '20 at 22:23
  • I am getting this Error :- failed to solve with frontend dockerfile.v0: failed to create LLB definition: Dockerfile parse error line 10: Unknown flag: mount – Mayur Dangar Jun 18 '20 at 17:10
  • 2
    That sound like you missed the comment at the top of the `Dockerfile` or the Docker version is too old. I would create a new question with all your debugging information. – Andy Shinn Jun 20 '20 at 01:06
  • This worked for me locally but it got a error on circleci: https://github.com/moby/buildkit/issues/1142#issuecomment-696987398 – Almenon Sep 22 '20 at 21:22
  • `Docker version 19.03.9` work. `docker-compose version 1.26.2` with `export COMPOSE_DOCKER_CLI_BUILD=1` not work – Kamil Apr 06 '21 at 08:12
  • Nice! But I had to put quotes around the target command, `RUN --mount=type=cache,target="pip3 install -U pip setuptools --target pip_packages" ` – jtlz2 Jan 29 '23 at 13:30
  • You can purge the cache via `docker builder prune` - REF: https://docs.docker.com/engine/reference/commandline/builder_prune – jtlz2 May 15 '23 at 20:20
  • I also had to add (e.g.) `id=pip` - this was the only way I could reliably get the cache to be exploited – jtlz2 May 15 '23 at 20:28
  • (and don't forget `mode=0755`) – jtlz2 May 15 '23 at 20:30
32

To minimise the network activity, you could point pip to a cache directory on your host machine.

Run your docker container with your host's pip cache directory bind mounted into your container's pip cache directory. docker run command should look like this:

docker run -v $HOME/.cache/pip-docker/:/root/.cache/pip image_1

Then in your Dockerfile install your requirements as a part of ENTRYPOINT statement (or CMD statement) instead of as a RUN command. This is important, because (as pointed out in comments) the mount is not available during image building (when RUN statements are executed). Docker file should look like this:

FROM my/base

ADD . /srv

ENTRYPOINT ["sh", "-c", "pip install -r requirements.txt && python setup.py install && run_server"]
Jakub Kukul
  • 12,032
  • 3
  • 54
  • 53
  • 6
    Not what the OP was looking for in his use case but if your making a build server this is a great idea – oden Jan 02 '18 at 06:07
  • 2
    This seems like a recipe for problems, particularly the suggestion to point to the default host cache. You are potentially mixing arch-specific packages. – Giacomo Lacava Apr 04 '20 at 00:52
  • @GiacomoLacava thanks, that's a very good point. I adjusted my answer and removed the part that suggested to use the re-use the hosts's cache directory. – Jakub Kukul Apr 06 '20 at 15:38
1
pipenv install

by default tries to re-lock. When it does, the cached layer of the Docker build isn't used because Pipfile.lock has changed. See the docs

A solution for this is to version Pipfile.lock and use

RUN pipenv sync

instead.

Thanks to JFG Piñeiro.

enzo
  • 9,861
  • 3
  • 15
  • 38
Franco Milanese
  • 402
  • 3
  • 7
0

When you try achieve this with GitLab CI/CD then you could follow this doc: https://docs.gitlab.com/ee/ci/docker/docker_layer_caching.html

And if it's not work for you as is try explicitly to set DOCKER_BUILDKIT and BUILDKIT_INLINE_CACHE

variables:
    DOCKER_BUILDKIT: '1'
....

  script:
    - docker build ........ BUILDKIT_INLINE_CACHE=1 .
Egor B Eremeev
  • 982
  • 11
  • 20
-11

I found that a better way is to just add the Python site-packages directory as a volume.

services:
    web:
        build: .
        command: python manage.py runserver 0.0.0.0:8000
        volumes:
            - .:/code
            -  /usr/local/lib/python2.7/site-packages/

This way I can just pip install new libraries without having to do a full rebuild.

EDIT: Disregard this answer, jkukul's answer above worked for me. My intent was to cache the site-packages folder. That would have looked something more like:

volumes:
   - .:/code
   - ./cached-packages:/usr/local/lib/python2.7/site-packages/

Caching the download folder is alot cleaner though. That also caches the wheels, so it properly achieves the task.

jaywhy13
  • 1,068
  • 12
  • 16
  • 2
    And what happens when you try to build this dockerfile on a different machine. This is not a sustainable solution. – Aaron McMillin Jan 18 '17 at 20:55
  • Genuinely confused, this turned out being buggy and I wasn't actually sure why. Could you give some more details. – jaywhy13 Jan 20 '17 at 16:30
  • 3
    Your docker image is depending on state from the host system. This voids most of the utility of docker. Everything the image needs should be installed in it. use the Dockerfile to install all the dependancies. If you want to avoid re-downloading the packages every time you build the answer from jkukul to mount the pip cache is the way to go. – Aaron McMillin Jan 20 '17 at 18:57
  • 2
    Lightbulb just went off thanks. I was actually trying to mount the site-packages directory from the VM, not the host. Quite an oversight. I think in spirit I was trying to do the same as jkulkul suggested.Thanks for the clarity! – jaywhy13 Jan 22 '17 at 16:55
  • 1
    @AaronMcMillin He is actually not depending on a path on the host. He's mounting the site-packages in the container to an anonymous volume. Still a bad idea though – ruohola Apr 20 '20 at 23:02