Best practice for pulling source code and using it in a docker image

Question

A total Docker newbie here.

I have a web application that uses two repositories. One of the repositories is basically a 'client' app, while the second one is the server. The server serves the static files from the client app.

I would like to dockerize the whole thing. In order to do so, now I'm wondering what is the best practice for this:

pull the client and build it inside the image
do the rest

Or

pull the client code in an external bash script
somehow copy the build files to the image
do the rest

Or

pull the client code in an external bash script
never put the client code in the image, use it externally somehow
do the rest

The first approach is working actually, but it seems wasteful since the image is now very big and contains disposal files.

The second approach feels "better" but when I run docker-compose up from the bash script I can't copy the files already, since the script is already running:

#!/bin/bash

git clone ... ~/tmp/client
(cd ~/tmp/client && yarn && yarn build && mv build ~/tmp/build)
docker-compose up
rm -rf ~/tmp/client

As for the third approach I don't even know how to do that.

Any suggestion or reference would be very helpful.

I use [`buildkit`](https://docs.docker.com/develop/develop-images/build_enhancements/) and [`git clone` during the build](https://stackoverflow.com/a/64036342/1423507) like [this example](https://stackoverflow.com/a/57741684/1423507). — masseyb, Sep 29 '20 at 07:41
IMHO cloning other stuff from inside a build feels a bit wrong. How do you control which exact version should be cloned? Will the same build produce the same result when you run it in three weeks? How do you specify the commit you want to clone? Maybe git submodules or subtree can solve that. — Andreas Jägle, Sep 29 '20 at 07:47
@AndreasJägle depends on what you're cloning, e.g. can clone a [`tag`](https://git-scm.com/book/en/v2/Git-Basics-Tagging) or a [release branch](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow). Same diff for `git submodules`, clone the repo at a specific tag and `git submodule update --init` to initialize the submodules at the commit registered in the checkouts index. As long as the repo is stable then there's no reason to not clone it during the build IMHO. If the repo you're cloning is unstable then what?... Clone it out of docker then hack at it and build? Meh.. — masseyb, Sep 29 '20 at 13:19
@AndreasJägle FWIW given your answer, [this example](https://stackoverflow.com/a/57741684/1423507) uses [multi-staged](https://docs.docker.com/develop/develop-images/multistage-build/) builds to copy out the artifacts from the `build` stage. Your answer doesn't address how to get the code into the build, your pseudo code uses `COPY` so what, you're fine with `COPY`'ing a potentially dirty checkout into a build but not cloning a specific checkout during the build? Doesn't make sense IMHO. "How do you make sure what you're `COPY`'ing doesn't change in 2 weeks?" Use `git`... — masseyb, Sep 29 '20 at 13:27
@masseyb I'd say COPYing from the CI checkout is fine as the specific version of the other repo is managed using git, e.g. by using submodules or subtrees which is basically the same as if the client code would be in the same repository. So you always get the correct version, even if you build it again in some months. Referencing a static tag/commit from a script works only if that code doesn't change. My advice was basically to think about referencing the correct version by git and then using the CI capabilities and multistage builds to avoid having a git executable/sdks in the prod image. — Andreas Jägle, Sep 29 '20 at 16:07

Andreas Jägle · Accepted Answer · 2020-09-29T07:50:41.300

Great question! Even though there are several ways to solve this, there are quite some differences and drawbacks with some of these approaches. Back in the days the pattern was basically to build stuff outside (on the host) and then copy the relevant things into the image if you wanted to avoid having all the SDKs and sources in your production image.

Luckily there are better ways to solve this today: multistage docker builds.

A multistage Dockerfile is like a regular Dockerfile but it contains several stages (aka more than one FROM statement). Each stage is a fresh start of an image build. Not all images might end up in your container registry as some of them are just used to trigger intermediate build steps.

Pseudo code

FROM node:version AS frontend-build
WORKDIR /src
COPY src/frontend . # or better package.json/package-lock.json first, then install, then the rest
RUN npm ci # or yarn build

FROM jdk-plus-buildtool:version AS backend-build
WORKDIR /app
COPY src/backend .
RUN mvn package # or similar

FROM trimmed-down-runtime:version
WORKDIR /app
COPY --from=backend-build target/myapp/ .
COPY --from=frontend-build dist/ ./static-files-folder
CMD your-run-command # or entrypoint

Using this approach has several advantages:

Your final image will contain only the minimal dependencies needed to run your application (e.g. JRE, java application, static javascript files)
Nothing is build outside a container which limits the effects of the environment on the build. Every tool required must be available in the build container, which makes the builds pretty reliable and reproducible
The build can easily be run on a developer machine producing the same results even though the developer might have different versions of npm/java locally on their machine
No build tools, sdks, source files or intermediate artifacts end up in your final image
Even the backend part itself can become smaller because you no longer ship the SDK (e.g. JDK for a java app) when moving those into a production container
You can leverage the docker build cache even more because whole parts can be skipped if nothing changed (e.g. reuse the java build if only javascript files changed)
You have more fine-grained control over the dependencies used in each build step and the build itself has less inter-dependencies as the steps for the different technologies are running in different containers.

If you are talking about a static javascript application and an HTTP API backend server, you could also use two separate images (frontend and backend) and then set up network and proxying accordingly so that you only expose the frontend container to the world and all requests are routed through the frontend to the backend application.

One more comment: You are talking about different repositories for client and server. Usually the CI environment cares about checking out the desired versions of your code before the real build starts. If this server is basically used from this one client only, I would use the bundled approach and also move the client sources into a subfolder of the main server repository. This makes it easier to do bugfixes for the whole system with a single bugfix branch. If you really cannot move source code between repositories, I would go with some git submodule/subtree approach to avoid dealing with commit references on my own during the build.

Best practice for pulling source code and using it in a docker image

1 Answers1