How do I make yarn cache modules when building containers?

Question

This is my Dockerfile for local development:

FROM node:12-alpine

WORKDIR /usr/app

ENV __DEV__ 1

COPY package.json ./
COPY yarn.lock ./
RUN yarn --frozen-lockfile

COPY tsconfig.json ./
COPY nodemon.json ./

RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]

CMD [ "yarn", "dev" ]

This is how I build it:

docker build --rm -f Dockerfile.dev --tag my-app .

This is how I run it:

docker run --rm -it --volume $(pwd)/src:/usr/app/src -p 3000:3000 my-app

I need to build it only when something outside the src folder changes. For instance, when I install node modules. How to I make yarn to cache modules somewhere, so it would not pull all modules on each build.

SteveGoob · Accepted Answer · 2021-02-12T20:38:18.243

The next generation of building containers with Docker is using Buildkit. I recommend using it, especially since it has an elegant solution for caching issues. There really isn't a good solution for this in vanilla Docker at the moment; while you can work around it, it's very cumbersome.

I'll list both solutions here:

With Buildkit

Tarun's answer is on the right track, but there's a cleaner way of doing it. Buildkit has support for specifying a mount as a cache. Once you've set up Docker to use Buildkit, all we need to do is:

...
RUN --mount=type=cache,target=/root/.yarn YARN_CACHE_FOLDER=/root/.yarn yarn install
...

This will automatically pull in the previous run's cache or create it if it doesn't exist yet or has expired. It's that simple.

Vanilla Docker

Alternatively, you can use vanilla Docker if using Buildkit isn't an option. The best thing we can do here is use a COPY directive to copy in some sort of "cache" located in the build context. For example, if we create a directory .yarn_cache in the root of your build context, then we can provide a cache with:

...
COPY .yarn_cache /root/.yarn
RUN yarn --frozen-lockfile
...

This external cache will not be updated when your image is built, and it will need to be initialized and periodically updated outside of your image. You can do this with the following shell command (clear any local node_modules on the first run to force it to warm the cache):

$ YARN_CACHE_FOLDER=.yarn_cache yarn install

Now while this works, it's very hack-y and comes with some downsides:

You need to manually create and update the cache.
The entire .yarn_cache directory needs to be included in the build context, which can be very slow, not to mention it will have to do this on every build, even when nothing has changed.

For these reasons, the former solution is preferred.

Bonus Pro Tip: Including the yarn cache in either case above still leave it in the final image, increasing its size. If you use a multistage build, you can alleviate this issue:

# syntax = docker/dockerfile:1.2
FROM node:12-alpine as BUILDER

WORKDIR /usr/app

COPY package.json ./
COPY yarn.lock ./
RUN --mount=type=cache,target=/root/.yarn YARN_CACHE_FOLDER=/root/.yarn yarn --frozen-lockfile


FROM node:12-alpine

WORKDIR /usr/app

COPY --from=BUILDER node_modules ./node_modules


COPY package.json ./
COPY yarn.lock ./
COPY tsconfig.json ./
COPY nodemon.json ./

RUN apk add --no-cache tini
ENTRYPOINT [ "/sbin/tini", "--" ]

ENV __DEV__=1

CMD [ "yarn", "dev" ]

The Vanilla Docker isn't portable -- you're building node_modules on a different environment than what the container may be running. Node packages are able to specify which OS or architecture they are installed on so it would be problematic for this solution. — AndrewKS, Jun 16 '21 at 22:49

Николай · Answer 2 · 2023-02-12T15:44:23.573

Answers of Tarun Lalwani and SteveGoob are great but they miss one important detail, people can face with when they will build many containers in parallel.

In my case I build docker compose file with many containers for two architectures in parallel with buildx bake command:

docker buildx bake -f ./docker-compose.yml --set *.platform=linux/amd64,linux/arm64/v8 --pull --push

And if I insert --mount parameter as was suggested, build will fail because buildx will try to execute few yarn installs in parallel, which make cache inconsistent and break it completely.

So I changed RUN command a bit. Here is a new version:

RUN --mount=type=cache,target=/usr/local/share/.cache/yarn/v6,sharing=locked yarn install

Firstly, I decided not to make my own cache directory, but mount to the default one. How did I get the default one? I just run

docker run -it node:18-alpine yarn cache dir

And it printed current path of yarn's cache dir. In my case (and probably in the most others) it will be /usr/local/share/.cache/yarn/v6. So there is no need to create any additional folder and pass it as env variable.

Next thing is to add sharing=locked parameter to --mount. With this parameter it will wait for each parallel installation in sequence. The first one (for the first container and first architecture) will pull all the packages, save them to the cache and all next yarn installs will reuse that cache.

If you don't like when they are waiting for each other you can use sharing=private with some redundancy which will create own cache for each container+arch pair. Original info from documentation

score 3 · Answer 3 · answered Feb 06 '21 at 12:26

You can use buildkit for the same

https://docs.docker.com/develop/develop-images/build_enhancements/

--mount=type=cache in buildkit

Yarn can cache packages downloaded during build. Look at all options available to you

https://classic.yarnpkg.com/en/docs/cli/cache/

YARN_CACHE_FOLDER=<path> yarn <command>

So you will use something like below in your dockerfile

RUN --mount=type=bind,source=./.yarn,target=/root/.yarn,rw YARN_CACHE_FOLDER=/root/.yarn yarn install

You can use a ENV in your dockerfile earlier to make you don't need to repeat YARN_CACHE_FOLDER again and again

How do I make yarn cache modules when building containers?

3 Answers3

With Buildkit

Vanilla Docker

Linked