docker build with maven - how to prevent re-downloading dependencies

Question

I want the base image mavenDeps to download the dependencies and rebuild only when dependencies change, and the second image, mavenBuild to rebuild on code changes. However, on docker build . both maven commands download all dependencies. I might be misunderstanding how the stacking works or what to copy where.

What I have tried: explicitly copying everything from first container to second: COPY / / and various more specific COPY targets like .m2, building second container from the maven base image, like the first one, then copying everything from the first container.

Dockerfile:

FROM maven:3.5-jdk-8 as mavenDeps
COPY pom.xml pom.xml
RUN mvn dependency:resolve

FROM mavenDeps as mavenBuild
RUN mvn install

FROM java:8
COPY --from=mavenBuild ./target/*.jar ./
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

I am building with Docker Desktop 2.2.2.0 (engine 19.03.5) on MacOS.

EDIT 2020.03.04:

Answer from @gcallea effectively prevents re-downloading of dependencies listed in the pom file +1. However, the install step still pulls 100+ artifacts on each build triggered by a code change. Those are transient dependencies of maven-resources-plugin, maven-compiler-plugin and several other plugins which are not listed anywhere explicitly.

I need to work offline sometimes and would like to preload ALL dependencies, so no dependencies are pulled after code changes.

Please look at my solution at https://stackoverflow.com/a/71066133/418599 . — Antonio Petricca, Feb 10 '22 at 14:15

davidxxx · Accepted Answer · 2020-03-05T10:09:34.120

Before to tell you how I would process, I will explain the issue that you encounter.

Your Dockerfile relies on the build multi-stage feature.
Here stages are considered as intermediary layers that are not kept as layers in the final image. To keep files/folders between layers you have to explicit copy them as you done.

So concretely, it means that in the below instructions : maven resolves all dependencies specified in your pom.xml and it stores them in the local repository located on the layer of that stage :

FROM maven:3.5-jdk-8 as mavenDeps
COPY pom.xml pom.xml
RUN mvn dependency:resolve

But as said, the stage content is not kept by default. So all downloaded dependencies in the local maven repo are lost since you never copy that in the next stage :

FROM mavenDeps as mavenBuild
RUN mvn install

Since the local repo of that image is empty : mvn install re-download all dependencies.

How to process ?

You have really many many ways.
The best choice depends on your requirement.
But whatever the way, the build strategy in terms of docker layers looks like :

Build stage (Maven image) :

pom copy to the image
dependencies and plugins downloads.
About that, mvn dependency:resolve-plugins chained to mvn dependency:resolve may do the job but not always.
Why ? Because these plugins and the package execution may rely on different artifacts/plugins and even for a same artifact/plugin, these may still pull a different version. So a safer approach while potentially slower is resolving dependencies by executing exactly the mvn package command (which will pull exactly dependencies that you are need) but by skipping the source compilation and by deleting the target folder to make the processing faster and to prevent any undesirable layer change detection for that step.
source code copy to the image
package the application

Run stage (JDK or JRE image) :

copy the jar from the previous stage

1) No explicit cache for maven dependencies : straight but annoying when pom changes frequently

If re-downloading all dependencies at every pom.xml change is acceptable.

Example by starting from your script :

########build stage########
FROM maven:3.5-jdk-8 as maven_build
WORKDIR /app

COPY pom.xml .
# To resolve dependencies in a safe way (no re-download when the source code changes)
RUN mvn clean package -Dmaven.main.skip -Dmaven.test.skip && rm -r target

# To package the application
COPY src ./src
RUN mvn clean package -Dmaven.test.skip

########run stage########
FROM java:8
WORKDIR /app

COPY --from=maven_build /app/target/*.jar

#run the app
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

Drawback of that solution ? Any changes in the pom.xml means re-create the whole layer that download and stores the maven dependencies.
That is generally not acceptable for applications with many dependencies, overall if you don't use a maven repository manager during the image build.

2) Explicit cache for maven dependencies : require more configurations and use of buildkit but that is more efficient because only required dependencies are downloaded

The only thing that changes here is that maven dependencies download are cached in the docker builder cache :

# syntax=docker/dockerfile:experimental
########build stage########
FROM maven:3.5-jdk-8 as maven_build
WORKDIR /app

COPY pom.xml .    
COPY src ./src

RUN --mount=type=cache,target=/root/.m2 mvn clean package  -Dmaven.test.skip

########run stage########
FROM java:8
WORKDIR /app

COPY --from=maven_build /app/target/*.jar

#run the app
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

To enable buildkit, the env variable DOCKER_BUILDKIT=1 has to be set (you can do that where you want : bashrc, command line, docker daemon json file...)

thank you for the explanation, davidxxx. Have tried first way with all 3 `resolve`, `resolve-plugins` & `go-offline` in one `RUN`. Still the `package` step pulls all the `resources`, `compiler`, `surefire` & `jar` plugin dependencies - totally puzzling. Will try second approach. — kostja, Mar 04 '20 at 12:42
If at the mvn install, it still download dependencies,the second approach will not help here. That improves the previous maven command (resolving), not the `install`. In your case, the best advise that I can give you is testing without docker to check that you don't have a docker build issue. In local : wipe your local maven repo (or just rename that), then execute `mvn dependency:resolve-plugins && mvn dependency:resolve`. And at last execute `mvn install`. Does the install command redownload many things ? — davidxxx, Mar 04 '20 at 12:48
Surprise! Dockerless build works perfectly - no dependencies are pulled on `package`. Second approach works perfectly as well. I do hope there is a sane way to enforce `DOCKER_BUILDKIT=1` on hosts, so my `docker-compose` files remain portable, but the question is thus answered. Thanks again, davidxxx. — kostja, Mar 04 '20 at 13:08
You are welcome :) Happy for you that it works (and that is the most efficient way) but that time it's my turn to be puzzled about the reason of the re-downloads with the first way. If that is a public code I would be very interested to test that. About buildkit and docker-compose, export these two lines in your .bashrc : `export DOCKER_BUILDKIT=1 export COMPOSE_DOCKER_CLI_BUILD=1`, refresh your shell with `source` and both docker and docker-compose could use buildkit without any additional configuration. — davidxxx, Mar 04 '20 at 13:38
yes, it's public code, all comments and contributions welcome: https://github.com/ksilin/kafka-platform-prometheus/blob/master/sample-application/consumer/Dockerfile — kostja, Mar 04 '20 at 13:52
@kostja I understood the problem of the first way (I fixed that) and I also improved the second way. — davidxxx, Mar 04 '20 at 18:27
very interesting, I appreciate your tenacity :) now the first way works too, as long as I either `clean` in both `RUN`s or in neither. Other wise `clean` plugin is pulled on second `RUN`. Can you please expand your answer to explain why `dependencies:*` are not sufficient? btw, I assume `RUN RUN` in first Dockerfile is a typo. — kostja, Mar 05 '20 at 09:26
@kostja Thanks to you for that excellent use case that has allowed me to understand a new thing. About dependencies resolving, I updated to be clearer (yesterday I wrote that a little fast). Typo fixed. About the clean beheavuir, that is expected because if that is required by any of the two maven executions, that has to be necessarily downloaded at least once. — davidxxx, Mar 05 '20 at 10:15
There is https://github.com/qaware/go-offline-maven-plugin which _actually_ does what `mvn dependency:go-offline` is supposed to do. The author explains the problems and his solution very well in the readme. — ivant, Feb 05 '21 at 16:35

score 5 · Answer 2 · edited Jun 10 '21 at 10:14

5

You don't need to divide build phase into 2 different stages mavenDeps and mavenBuild. You can include a single buildstage taking advantage of Docker layers for the same purpose.

You can structure your Dockerfile as follow for your purpose:

#----
# Build stage
#----
FROM maven:3.5-jdk-8 as buildstage
# Copy only pom.xml of your projects and download dependencies
COPY pom.xml .
RUN mvn -B -f pom.xml dependency:go-offline
# Copy all other project files and build project
COPY . .
RUN mvn -B install

#----
# Final stage
#----
FROM java:8
COPY --from=buildstage ./target/*.jar ./
ENV JAVA_OPTS ""
CMD [ "bash", "-c", "java ${JAVA_OPTS} -jar *.jar -v"]

Doing this only when changes are made over pom.xml the dependencies will be re-dow nloaded. Otherwise Docker layer related to command RUN mvn -B -f pom.xml dependency:go-offline will be reused as cache.

edited Jun 10 '21 at 10:14

Gaël J

11,274
4
17
32

answered Mar 04 '20 at 09:40

gregorycallea

1,218
1
9
28

thank you @gcallea. tried right now, seems to work partially. the `install` step seems not to download the dependencies, but still downloads a lot of artifacts - transient dependencies of `maven-resources-plugin`, `maven-compiler-plugin` and several others, 100+ artifacts in total. Can those be fetched in a separate former step? – kostja Mar 04 '20 at 09:50
So if you don't change nothing and execute another build what layers are reused from cache? Just the `mvn -B install` one? – gregorycallea Mar 04 '20 at 09:56
when I change code only, all the maven plugin artifacts are being downloaded every time, not the application dependencies. I would like to avoid that as well to be able to work offline. – kostja Mar 04 '20 at 09:59
It is correct because when you change code you invalidate Docker layer related to command `COPY . .` and so all next commands are re-executed and not taken from cache. Anycase about artifacts the issue is not strictly related to docker because in general is not easy to using cached artifacts in consecutive builds ( see for example the following thread **https://stackoverflow.com/questions/19696053/using-cached-artifacts-in-maven-to-avoid-redundant-builds** ) – gregorycallea Mar 04 '20 at 10:07
So even my solution solve to avoid to re-downloading dependencies, as you asked on your question, you should investigate about how to improve performance on consecutive maven builds – gregorycallea Mar 04 '20 at 10:07
1

well, yes and no :) the dependencies from the pom aren't re-downloaded, so 'yes' from PoV of my assumptions and +1. Other dependencies are re-downloaded still, so 'no' from the PoV of my question. I was not aware of these additional dependencies and will add this information to the question. – kostja Mar 04 '20 at 10:16

docker build with maven - how to prevent re-downloading dependencies

2 Answers2