4

I'm trying to use the docker buildkit approach to caching packages to speed up adding packages to docker containers. I learned about it from the instructions for both python and apt-get packages and useful Stackexchange answer on caching python packages while building Docker. For Python and apt-get I am able to get this to work, but I can't get it to work for R packages.

In a Dockerfile for Python I'm able to change:

RUN pip install -r requirements.txt

to (and the comment looking bit at the top of the Dockerfile is needed)

# syntax=docker/dockerfile:experimental
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt

And then when I add a package to the requirements.txt file, rather than re-downloading and building the packages, pip is able to re-use all the work it has done. So buildkit cache mounts add a level of caching beyond the image layers of docker. It's a massive timesaver. I'm hoping to set up something similar for r-packages.

Here is what I've tried that works for apt-get but not r-packges. I've also tried with the install2.r script.

# syntax=docker/dockerfile:experimental
FROM rocker/tidyverse
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
  apt update && apt install -y gcc \
      zsh \
      vim

COPY ./requirements.R .
RUN --mount=type=cache,target=/usr/local/lib/R/site-library Rscript ./requirements.R

I think I don't understand:

  1. How buildkit works. Does it do the building of containers inside a container? ie the cache path is on the 'build container'?
  2. What one needs to specify as the target for R to notice that it already has downloaded (and possibly built).

I suspect that it has something to do with the keep.source command when installing an R package, as discussed in this question

jameshowison
  • 151
  • 8
  • Still figuring this out. But it seems like renv is going to be relevant here, as it provides a cache, but it seems to want to link at container run time rather than at build time, assuming that the cache is on the host machine rather than the buildx building container https://rstudio.github.io/renv/articles/docker.html – jameshowison Feb 27 '20 at 23:37
  • hi @jameshowison, did you find out how to do this? I'm completely new to this, and wanted to know if there is a working solution that can save time by using some type of r package install cache. – Spencer Trinh Dec 27 '20 at 20:30
  • This is as far as I got: https://github.com/howisonlab/test_repo_buildx_renv I think the binder people are working towards this as well. – jameshowison Dec 29 '20 at 15:15
  • thanks for sharing! – Spencer Trinh Dec 29 '20 at 23:47

0 Answers0