84

I'm hoping to get my pip install instructions inside my docker builds as fast as possible.

I've read many posts explaining how adding your requirements.txt before the rest of the app helps you take advantage of Docker's own image cache if your requirements.txt hasn't changed. But this is no help at all when dependencies do change, even slightly.

The next step would be if we could use a consistent pip cache directory. By default, pip will cache downloaded packages in ~/.cache/pip (on Linux), and so if you're ever installing the same version of a module that has been installed before anywhere on the system, it shouldn't need to go and download it again, but instead simply use the cached version. If we could leverage a shared cache directory for docker builds, this could help speed up dependency installs a lot.

However, there doesn't appear to be any simple way to mount a volume while running docker build. The build environment seems to be basically impenetrable. I found one article suggesting a genius but complex method of running an rsync server on the host and then, with a hack inside the build to get the host IP, rsyncing the pip cache in from the host. But I'm not relishing the idea of running an rsync server in Jenkins (which isn't the most secure platform at the best of times).

Does anyone know if there's any other way to achieve a shared cache volume more simply?

Robin Winslow
  • 10,908
  • 8
  • 62
  • 91

1 Answers1

140

I suggest you to use buildkit, also see this.

Dockerfile:

# syntax = docker/dockerfile:experimental
FROM python:3.6-alpine
RUN --mount=type=cache,target=/root/.cache/pip pip install pyyaml

NOTE: # syntax = docker/dockerfile:experimental is a must,you have to add it at the beginning of Dockerfile to enable this feature.

1.

The first execute build:

export DOCKER_BUILDKIT=1
docker build --progress=plain -t abc:1 . --no-cache

The first log:

#9 [stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install...
#9   digest: sha256:55b70da1cbbe4d424f8c50c0678a01e855510bbda9d26f1ac5b983808f3bf4a5
#9 name: "[stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install pyyaml"
#9  started: 2019-09-20 03:11:35.296107357 +0000 UTC
#9 1.955 Collecting pyyaml
#9 3.050   Downloading https://files.pythonhosted.org/packages/e3/e8/b3212641ee2718d556df0f23f78de8303f068fe29cdaa7a91018849582fe/PyYAML-5.1.2.tar.gz (265kB)
#9 5.006 Building wheels for collected packages: pyyaml
#9 5.007   Building wheel for pyyaml (setup.py): started
#9 5.249   Building wheel for pyyaml (setup.py): finished with status 'done'
#9 5.250   Created wheel for pyyaml: filename=PyYAML-5.1.2-cp36-cp36m-linux_x86_64.whl size=44104 sha256=867daf35eab43c2d047ad737ea1e9eaeb4168b87501cd4d62c533f671208acaa
#9 5.250   Stored in directory: /root/.cache/pip/wheels/d9/45/dd/65f0b38450c47cf7e5312883deb97d065e030c5cca0a365030
#9 5.267 Successfully built pyyaml
#9 5.274 Installing collected packages: pyyaml
#9 5.309 Successfully installed pyyaml-5.1.2
#9completed: 2019-09-20 03:11:42.221146294 +0000 UTC
#9 duration: 6.925038937s

From above, you can see the first time, the build will download pyyaml from internet.

2.

The second execute build:

docker build --progress=plain -t abc:1 . --no-cache

The second log:

#9 [stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install...
#9   digest: sha256:55b70da1cbbe4d424f8c50c0678a01e855510bbda9d26f1ac5b983808f3bf4a5
#9 name: "[stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install pyyaml"
#9  started: 2019-09-20 03:16:58.588157354 +0000 UTC
#9 1.786 Collecting pyyaml
#9 2.234 Installing collected packages: pyyaml
#9 2.270 Successfully installed pyyaml-5.1.2
#9completed: 2019-09-20 03:17:01.933398002 +0000 UTC
#9 duration: 3.345240648s

From above, you can see the build no longer download package from internet, just use the cache. NOTE, this is not the traditional docker build cache as I have use --no-cache, it's /root/.cache/pip which I mount into build.

3.

The third execute build which delete buildkit cache:

docker builder prune
docker build --progress=plain -t abc:1 . --no-cache

The third log:

#9 [stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install...
#9   digest: sha256:55b70da1cbbe4d424f8c50c0678a01e855510bbda9d26f1ac5b983808f3bf4a5
#9 name: "[stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install pyyaml"
#9  started: 2019-09-20 03:19:07.434792944 +0000 UTC
#9 1.894 Collecting pyyaml
#9 2.740   Downloading https://files.pythonhosted.org/packages/e3/e8/b3212641ee2718d556df0f23f78de8303f068fe29cdaa7a91018849582fe/PyYAML-5.1.2.tar.gz (265kB)
#9 3.319 Building wheels for collected packages: pyyaml
#9 3.319   Building wheel for pyyaml (setup.py): started
#9 3.560   Building wheel for pyyaml (setup.py): finished with status 'done'
#9 3.560   Created wheel for pyyaml: filename=PyYAML-5.1.2-cp36-cp36m-linux_x86_64.whl size=44104 sha256=cea5bc4689e231df7915c2fc3abca225d4ee2e869a7540682aacb6d42eb17053
#9 3.560   Stored in directory: /root/.cache/pip/wheels/d9/45/dd/65f0b38450c47cf7e5312883deb97d065e030c5cca0a365030
#9 3.580 Successfully built pyyaml
#9 3.585 Installing collected packages: pyyaml
#9 3.622 Successfully installed pyyaml-5.1.2
#9completed: 2019-09-20 03:19:12.530742712 +0000 UTC
#9 duration: 5.095949768s

From above, you can see if delete buildkit cache, the package download again.

In a word, it will give you a shared cache between several times build, and this cache will only be mounted when image build. But, the image self will not have these cache, so avoid a lots of intermediate layer in image.

EDIT for folks who are using docker compose and are lazy to read the comments...:

You can also do this with docker-compose if you set COMPOSE_DOCKER_CLI_BUILD=1. For example: COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build –

UPDATE according to folk's question 2020/09/02:

I don't know from which version (my version now is 19.03.11), if not specify mode for cache directory, the cache won't be reused by next time build.

Don't know the detail reason, but you could add mode=0755, to Dockerfile to make it work again:

Dockerfile:

# syntax = docker/dockerfile:experimental
FROM python:3.6-alpine
RUN --mount=type=cache,mode=0755,target=/root/.cache/pip pip install pyyaml

UPDATE according to folk's question 2023/04/23:

Q: Where is the cache exactly on the host?

A: The cache on host is maintained by docker with an overlay. You could use next command docker buildx du --verbose and find a entry type Type: exec.cachemount, after that you got the ID: ntpjzcz8hhx31b80nwxji05hn:

ID:             ntpjzcz8hhx31b80nwxji05hn
Created at:     2023-04-23 01:36:41.102680066 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           3.601MB
Description:    cached mount /root/.cache/pip from exec /bin/sh -c pip install pyyaml
Usage count:    2
Last used:      7 minutes ago
Type:           exec.cachemount

Afterwards, go to /var/lib/docker/overlay2/ntpjzcz8hhx31b80nwxji05hn/diff/cache/wheels to find the cached pyyaml (depends on the ID you got from above). For my station, it looks like next:

root@shdebian1:/var/lib/docker/overlay2/ntpjzcz8hhx31b80nwxji05hn/diff/cache/wheels/81/5a/02/b3447894318b70e3cbff3cb4f1a50d9d50a848185358de1d71# ls

PyYAML-6.0-cp36-cp36m-linux_x86_64.whl

atline
  • 28,355
  • 16
  • 77
  • 113
  • This looks like *exactly* what I was looking for. I'll give it a try and accept if it works as described. – Robin Winslow Sep 20 '19 at 11:25
  • 5
    Works perfectly, thank-you so much. One minor point - even though you included it in your example, I missed the significance of the `# syntax = docker/dockerfile:experimental` line and so didn't copy it. Ended up at https://stackoverflow.com/questions/55153089/error-response-from-daemon-dockerfile-parse-error-unknown-flag-mount to correct my mistake. You might want to just emphasise the need for that line in your answer. – Robin Winslow Sep 20 '19 at 12:14
  • 3
    I just came back to this question and remembered how much I appreciated it. Then I found this - https://meta.stackoverflow.com/questions/288643/why-cant-a-bounty-created-to-reward-an-existing-answer-be-awarded-immediately - and I think I'm going to award you 100 bounty for giving me exactly what I wanted =) But it looks like it'll take 24 hours before I can award it. – Robin Winslow Dec 10 '19 at 17:02
  • Do I understand this correctly that this won't reuse hosts pip cache? – Suor Mar 24 '20 at 07:30
  • 2
    Won't reuse the hosts pip cache, internal manged a cache to reuse among different rebuild. – atline Mar 24 '20 at 08:09
  • @atline I have different docker services which uses similar requirements.txt. Their Dockerfiles are same. Is it possible to make them able to share pip cache between each other? – Mr.D Apr 08 '20 at 11:40
  • 1
    @Mr.D Yes, it can, as long as you did not specify different `id` in dockerfile, all `target=/root/.cache/pip` in different Dockerfile means the same thing, that is `mount a same docker maintained cache folder to target/root/.cache/pip`. If this not you want, you could use `target=/root/.cache/pip,id=myid` to distinguish cache name. If no `id` specified, just means same id. – atline Apr 08 '20 at 14:31
  • @atline I am copy-pasting the example is the answer but its always downloading the file for me. My Docker version is 19.03.8 (build afacb8b). I am on MacOS Catalina. Even on Second Run it's like: ```#7 [stage-0 2/2] RUN --mount=type=cache,target=/root/.cache/pip pip install... #7 1.477 Collecting pyyaml #7 1.764 Downloading PyYAML-5.3.1.tar.gz (269 kB)``` – codersofthedark Apr 27 '20 at 13:03
  • @atline have added the problem as a question here: https://stackoverflow.com/q/61459775/1060337 – codersofthedark Apr 27 '20 at 13:17
  • 1
    @codersofthedark See my answer: https://stackoverflow.com/a/61474441/6394722 – atline Apr 28 '20 at 07:04
  • 6
    You can also do this with `docker-compose` if you set `COMPOSE_DOCKER_CLI_BUILD=1`. For example: `COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build` – SuperFunkyMonkey May 01 '20 at 20:08
  • hi @atline, do you know if DOCKER_BUILDKIT can be enabled using Docker SDK for Python? – rkj Jul 20 '20 at 03:24
  • 1
    @RytisJonynas I guess can't currently, see https://github.com/docker/docker-py/issues/2230 – atline Jul 20 '20 at 05:18
  • @atline I followed your answer on my VM just with `pip install -r requirements.txt`. It worked well initially, however after a while the docker experimental image build would once again start downloading from PyPi. I thought it's because of the system cache gets cleared / invalidated. So I cleaned docker build cache and changed cache directory param in mount command to `/home/my_user/newcachedir/`, however, now docker experimental image build installs everything either from network or old cache dir (`/home/my_user/.cache/pip`). Any ideas why and how to make it work with new dir for cache? – rkj Jul 21 '20 at 15:47
  • @atline I posted my question here: https://stackoverflow.com/q/63046991/11268971. Would appreciate any help. – rkj Jul 23 '20 at 04:39
  • Does this still work? I'm trying this now, and it's still downloading my dependencies. – afagarap Sep 01 '20 at 14:34
  • @afagarap YES, your are correct, I tried it, looks with new docker, a `mode` parameter need to be added, see updated answer. – atline Sep 02 '20 at 02:52
  • @atline Cool. Thanks. I'll try it, and let you know what I get. – afagarap Sep 02 '20 at 04:19
  • Seems that, using Docker 20.10.2, is not working any more – Cesar Feb 01 '21 at 17:48
  • @Cesar I confirmed it works for 20.10.2 on my side. – atline Feb 07 '21 at 09:30
  • Is there any similar way to do this without using BuildKit? – dem1tris May 06 '21 at 11:53
  • 8
    This works without `# syntax = docker/dockerfile:experimental` comment using Docker 20.10.6 and docker-compose 1.27.4. – niekas May 07 '21 at 22:29
  • I also needed to add the `uid=1234` option to the mount because by default "cache" mounts as root:root, and since my dockerfile doesn't run as root, pip refused to use the cache until I changed the uid. Otherwise, it worked like a charm! – rotten Jul 09 '21 at 17:05
  • 1
    I am confused about the `/root/.cache` directory. Is that location in my Docker container or in my main machine? Why does /root/.cache work? Is it specific to pip? – cozos Mar 28 '22 at 05:21
  • 1
    @cozos YES, `$HOME/.cache/pip` is the cache of pip to cache the downloaded package, you could use `pip3 cache dir` to verify it. When docker build, there will a build container there with `root` as account, so when docker build, the pip package downloaded to `/root/.cache`. Buildkit will store that on host, and when you rebuild, that dir will remount to build container, then you no need to download the package again. – atline Mar 28 '22 at 05:27
  • That makes a lot of sense with `pip3 cache dir`. Thanks a lot! – cozos Mar 28 '22 at 06:39
  • 2
    `mode=0755,` - hidden gem, thank you! – jtlz2 Jan 29 '23 at 21:04
  • @atline "Buildkit will store that on host" - is that persistent on docker restarts? – jtlz2 Jan 29 '23 at 21:44
  • 1
    @jtlz2 Yes, persistent on host even `systemctl restart docker` – atline Jan 30 '23 at 15:05
  • @atline Just to be clear, where exactly will it store it on the host? In `$HOME/.cache/pip`? – Dr_Zaszuś Apr 21 '23 at 13:32
  • @Dr_Zaszuś As it's too long, see my new edit in answer. – atline Apr 23 '23 at 02:12