153

I am currently developing a Node backend for my application. When dockerizing it (docker build .) the longest phase is the RUN npm install. The RUN npm install instruction runs on every small server code change, which impedes productivity through increased build time.

I found that running npm install where the application code lives and adding the node_modules to the container with the ADD instruction solves this issue, but it is far from best practice. It kind of breaks the whole idea of dockerizing it and it cause the container to weight much more.

Any other solutions?

d-cubed
  • 1,034
  • 5
  • 30
  • 58
ohadgk
  • 3,463
  • 3
  • 11
  • 10

5 Answers5

175

Ok so I found this great article about efficiency when writing a docker file.

This is an example of a bad docker file adding the application code before running the RUN npm install instruction:

FROM ubuntu

RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get -y install python-software-properties git build-essential
RUN add-apt-repository -y ppa:chris-lea/node.js
RUN apt-get update
RUN apt-get -y install nodejs

WORKDIR /opt/app

COPY . /opt/app
RUN npm install
EXPOSE 3001

CMD ["node", "server.js"]

By dividing the copy of the application into 2 COPY instructions (one for the package.json file and the other for the rest of the files) and running the npm install instruction before adding the actual code, any code change wont trigger the RUN npm install instruction, only changes of the package.json will trigger it. Better practice docker file:

FROM ubuntu
MAINTAINER David Weinstein <david@bitjudo.com>

# install our dependencies and nodejs
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get -y install python-software-properties git build-essential
RUN add-apt-repository -y ppa:chris-lea/node.js
RUN apt-get update
RUN apt-get -y install nodejs

# use changes to package.json to force Docker not to use the cache
# when we change our application's nodejs dependencies:
COPY package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/

# From here we load our application's code in, therefore the previous docker
# "layer" thats been cached will be used if possible
WORKDIR /opt/app
COPY . /opt/app

EXPOSE 3000

CMD ["node", "server.js"]

This is where the package.json file added, install its dependencies and copy them into the container WORKDIR, where the app lives:

ADD package.json /tmp/package.json
RUN cd /tmp && npm install
RUN mkdir -p /opt/app && cp -a /tmp/node_modules /opt/app/

To avoid the npm install phase on every docker build just copy those lines and change the ^/opt/app^ to the location your app lives inside the container.

ohadgk
  • 3,463
  • 3
  • 11
  • 10
  • 2
    That works. Some points though. `ADD` is discouraged in favor for `COPY`, afaik. `COPY` is even more effective. IMO, the last two paragraphs are not necessary, since they are duplicates and also from the app point-of-view it doesn't matter where on the files system the app lives, as long as `WORKDIR` is set. – eljefedelrodeodeljefe Jun 03 '16 at 03:15
  • 4
    Better-still is to combine all of the apt-get commands onto one RUN, including an `apt-get clean`. Also, add ./node_modules to your .dockerignore, to avoid copying your working directory into your built container, and to speed up the build-context copy step of the build. – Symmetric Jul 28 '16 at 17:24
  • For what reason do you execute the copy command in a separate RUN? And does it matter if I move the node_modules package instead of copying it? Because it could get relatively big depending on how much you install – Marian Klühspies Feb 20 '17 at 11:20
  • 5
    The same approach but just adding `package.json` to the final resting position works fine as well (eliminating any cp/mv). – J. Fritz Barnes Apr 03 '17 at 16:46
  • @MarianKlühspies Each line in the Dockerfile creates a new layer for your final image. – riezebosch Apr 20 '17 at 12:36
  • Will the image be rebuilt if the files change when spinning up a new container? – Kenny Worden May 06 '17 at 23:16
  • 55
    I don't get it. Why do you install in a temp directory and then move it to the app directory? Why not just install in the app directory? What am I missing here? – joniba May 31 '17 at 16:57
  • Still searching for a better way, apps with a lot of dependencies need a lot of time when using `cp -a` on every build/startup when the `package*.json` has changed. And especially those with a lot of dependencies tend to need continuous `package*.json` changes. – Michael B. Mar 07 '19 at 10:06
  • 7
    This is probably dead, but thought I mention it for future readers. @joniba one reason to do this would be to mount the temp folder as a persisted volume in compose without interfering with the local host file system's node_modules. I.e. I might want to run my app locally but also in a container and still keep the ability to have my node_modules not constantly re-downloaded when package.json changes – dancypants Nov 20 '19 at 10:24
  • Since 2016 things have changed. Instead of the `/tmp` trick, consider using [--mount=type=cache in buildkit | StackOverflow](https://stackoverflow.com/questions/57581943/mount-type-cache-in-buildkit) It's best to think of `--mount=type=cache` as being like a named volume in docker, managed by BuildKit, and potentially deleted if the BuildKit cache gets full or a prune is requested. The next time you run a build, that same named volume may be available, significantly reducing the build time spent downloading dependencies. – Adrian Moisa May 07 '23 at 12:08
89

Weird! No one mentions multi-stage build.

# ---- Base Node ----
FROM alpine:3.5 AS base
# install node
RUN apk add --no-cache nodejs-current tini
# set working directory
WORKDIR /root/chat
# Set tini as entrypoint
ENTRYPOINT ["/sbin/tini", "--"]
# copy project file
COPY package.json .

#
# ---- Dependencies ----
FROM base AS dependencies
# install node packages
RUN npm set progress=false && npm config set depth 0
RUN npm install --only=production 
# copy production node_modules aside
RUN cp -R node_modules prod_node_modules
# install ALL node_modules, including 'devDependencies'
RUN npm install

#
# ---- Test ----
# run linters, setup and tests
FROM dependencies AS test
COPY . .
RUN  npm run lint && npm run setup && npm run test

#
# ---- Release ----
FROM base AS release
# copy production node_modules
COPY --from=dependencies /root/chat/prod_node_modules ./node_modules
# copy app sources
COPY . .
# expose port and define CMD
EXPOSE 5000
CMD npm run start

Awesome tuto here: https://codefresh.io/docker-tutorial/node_docker_multistage/

Abdennour TOUMI
  • 87,526
  • 38
  • 249
  • 254
  • 3
    What's up with having a `COPY` statement after `ENTRYPOINT`? – lindhe Mar 28 '20 at 14:54
  • Great, that provide a good advantage too when you're testing your Dockerfile without reinstall dependencies each time you edit your Dockerfile – Xavier Brassoud May 19 '20 at 11:46
  • 2
    @lindhe The order of `COPY` and `ENTRYPOINT` don't really matter. Maybe it would kind of make sense to set the `ENTRYPOINT` last if one thinks of it as "now we move on to running things", but from a Docker layer perspective, it actually would make more sense to put the entrypoint near the top of the Dockerfile stage that needs it, because it is likely to never change or to change VERY infrequently, meaning that layer should be able to be cached the majority of the time. Dockerfile statements should be in least-frequent to most-frequent change order, not any logical procedural order. – ErikE Apr 18 '22 at 16:31
  • Why the preference for multiple commands on a single line (e.g. `RUN foo && bar && baz`) over separate RUN commands? – this-sam May 12 '23 at 04:09
  • 2
    @this-sam to keep the number of layers low. – Octavian Helm May 21 '23 at 15:17
54

I've found that the simplest approach is to leverage Docker's copy semantics:

The COPY instruction copies new files or directories from and adds them to the filesystem of the container at the path .

This means that if you first explicitly copy the package.json file and then run the npm install step that it can be cached and then you can copy the rest of the source directory. If the package.json file has changed, then that will be new and it will re-run the npm install caching that for future builds.

A snippet from the end of a Dockerfile would look like:

# install node modules
WORKDIR  /usr/app
COPY     package.json /usr/app/package.json
RUN      npm install

# install application
COPY     . /usr/app
J. Fritz Barnes
  • 858
  • 7
  • 6
  • 9
    Instead of `cd /usr/app` you can/should use `WORKDIR /usr/app`. – Vladimir Vukanac Jan 17 '19 at 15:53
  • 1
    @VladimirVukanac :+1: on using WORKDIR; I've updated the answer above to take that into account. – J. Fritz Barnes Jan 25 '19 at 15:17
  • is npm install run in /usr/app directory or . ? – user557657 Feb 01 '19 at 19:06
  • 1
    @user557657 The WORKDIR sets the directory within the future image from which the command will be run. So in this case, it's running npm install from `/usr/app` within the image which will create a `/usr/app/node_modules` with dependencies installed from the npm install. – J. Fritz Barnes Feb 07 '19 at 14:35
  • 1
    @J.FritzBarnes thanks a lot. isnt `COPY . /usr/app` would copy `package.json` file again in `/usr/app` with the rest of the files? – user557657 Feb 08 '19 at 15:20
  • 1
    @user557657 yes, the package.json will get copied in a second time. It won't affect correct installation of the dependencies to the node_modules directory. Since docker is looking for changes in files copied to determine if it uses the cached layer or reruns the command; it will only rerun the copy & npm install if package.json changes. The `COPY . /usr/app` will be rerun if new files are added or files change. – J. Fritz Barnes Feb 14 '19 at 13:06
  • 2
    Docker won't rerun `npm install` command if `package.json` changes, it caches RUN command result and assumes that same RUN command produces same result. To invalidate cache you should run `docker build` with --no-cache flag, or change the RUN command somehow. – Mikhail Zhuravlev Apr 16 '19 at 08:35
  • I've resorted to having an old stable package.json copy called package-docker.json that I cache in the Dockerfile before loading the package.json. As well as having a separate scripts folder instead of constantly changing my package.json – Ray Foss Mar 11 '21 at 16:10
  • Did I understood correctly ? If package.json has no changes, COPY package.json layer will be used from cache and also by default RUN npm install which is next to it . If package.json has changes, COPY package.json layer will be cache miss as file content has changed, thus new layer for COPY package.json and also since RUN npm install is next layer, it will also be cache miss and re-created . "RUN is always cache hit unless previous layer is re-created while COPY cache hit/miss depends on file content changes." – Yusuf Sep 29 '22 at 04:33
3

you don't need to use tmp folder, just copy package.json to your container's application folder, do some install work and copy all files later.

COPY app/package.json /opt/app/package.json
RUN cd /opt/app && npm install
COPY app /opt/app
Mike Zhang
  • 333
  • 3
  • 13
0

I wanted to use volumes, not copy, and keep using docker compose, and I could do it chaining the commands at the end

FROM debian:latest
RUN apt -y update \
    && apt -y install curl \
    && curl -sL https://deb.nodesource.com/setup_12.x | bash - \
    && apt -y install nodejs
RUN apt -y update \
    &&  apt -y install wget \
        build-essential \
        net-tools
RUN npm install pm2 -g      

RUN mkdir -p /home/services_monitor/ && touch /home/services_monitor/
RUN chown -R root:root /home/services_monitor/

WORKDIR /home/services_monitor/

CMD npm install \
    && pm2-runtime /home/services_monitor/start.json
Daniel lm
  • 77
  • 1
  • 9