13

I'm working on a project with ~200MB dependencies and i'd like to avoid useless uploads due to my limited bandwidth.

When I push my Dockerfile (i'll attach it in a moment), I always have a ~200MB upload even if I didn't touch the pom.xml:

FROM maven:3.6.0-jdk-8-slim

WORKDIR /app

ADD pom.xml /app

RUN mvn verify clean --fail-never

COPY ./src /app/src

RUN mvn package

ENV CONFIG_FOLDER=/app/config
ENV DATA_FOLDER=/app/data
ENV GOLDENS_FOLDER=/app/goldens
ENV DEBUG_FOLDER=/app/debug

WORKDIR target

CMD ["java","-jar","-Dlogs=/app/logs", "myProject.jar"]

This Dockerfile should make a 200MB fatJAR including all the dependencies, that's why the ~200MB upload that occurs everytime. What i would like to achieve is building a Layer with all the dependencies and "tell" to the packaging phase to not include the dependencies JARs into the fatJAR but to search for them inside a given directory.

I was wondering to build a script that executes mvn dependency:copy-dependencies before the building process and then copying the directory to the container; then building a "non-fat"JAR that has all those dependencies only linked and not actually copied into it.

Is this possible?

EDIT: I discovered that the Maven Local Repository of the container is located under /root/.m2. So I ended making a very simple script like this:

BuildDocker.sh

mvn verify -clean --fail-never
mv ~/.m2 ~/git/myProjectRepo/.m2

sudo docker build -t myName/myProject:"$1"

And edited Dockerfile like:

# Use an official Python runtime as a parent image
FROM maven:3.6.0-jdk-8-slim

# Copy my Mavne Local Repository into the container thus creating a new layer
COPY ./.m2 /root/.m2

# Set the working directory to /app
WORKDIR /app

# Copy the pom.xml
ADD pom.xml /app

# Resolve and Download all dependencies: this will be done only if the pom.xml has any changes
RUN mvn verify clean --fail-never

# Copy source code and configs 
COPY ./src /app/src

# create a ThinJAR
RUN mvn package


# Run the jar
...

After the building process i stated that /root/.m2 has all the directories I but as soon as i launch the JAR i get:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Priority
    at myProject.ThreeMeans.calculate(ThreeMeans.java:17)
    at myProject.ClusteringStartup.main(ClusteringStartup.java:7)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Priority
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 2 more

Maybe i shouldn't run it through java -jar?

L. Don
  • 373
  • 1
  • 4
  • 15
  • I have not used this personally, but maybe Jib can help: https://github.com/GoogleContainerTools/jib – Thilo Dec 09 '18 at 11:39
  • Take a look at https://spring.io/blog/2018/11/08/spring-boot-in-a-container It's for Spring Boot, but you could use the same approach for every maven project. Basically you have to create your Docker file with multiple layers, so during the build Docker can cache the layers that are not changed. – Evgeni Dimitrov Dec 09 '18 at 11:41
  • in your edited question you introduce a command `COPY ./.m2 /root/.m2` similar to @MyTwoCents's suggestion (which could thus be viewed as an alternative to the standard solution of doing `RUN mvn dependency:go-offline -B`, even if `COPY ./.m2 /root/.m2` is less portable as it requires having maven on your host), but I am unsure this could address your main question about the ~200MB *upload* related to the push of a *fat jar*… cf. my other [comment](https://stackoverflow.com/questions/53691781/how-to-cache-maven-dependencies-in-docker/#comment94269471_53694920) – ErikMD Dec 10 '18 at 15:47
  • Please look at my solution at https://stackoverflow.com/a/71066133/418599 . – Antonio Petricca Feb 10 '22 at 14:17

3 Answers3

12

If I understand correctly what you'd like to achieve, the problem is to avoid creating a fat jar with all Maven dependencies at each Docker build (to alleviate the size of the Docker layers to be pushed after a rebuild).

If yes, you may be interested in the Spring Boot Thin Launcher, which is also applicable for non-Spring-Boot projects. Some comprehensive documentation is available in the README.md of the corresponding GitHub repo: https://github.com/dsyer/spring-boot-thin-launcher#readme

To sum up, it should suffice to add the following plugin declaration in your pom.xml:

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <!--<version>${spring-boot.version}</version>-->
            <dependencies>
                <dependency>
                    <groupId>org.springframework.boot.experimental</groupId>
                    <artifactId>spring-boot-thin-layout</artifactId>
                    <version>1.0.19.RELEASE</version>
                </dependency>
            </dependencies>
        </plugin>
    </plugins>
</build>

Ideally, this solution should be combined with a standard Dockerfile setup to benefit from Docker's cache (see below for a typical example).

Leverage Docker's cache mechanism for a Java/Maven project

The archetype of a Dockerfile that avoids re-downloading all Maven dependencies at each build if only source code files (src/*) have been touched is given in the following reference:
https://whitfin.io/speeding-up-maven-docker-builds/

To be more precise, the proposed Dockerfile is as follows:

# our base build image
FROM maven:3.5-jdk-8 as maven

WORKDIR /app

# copy the Project Object Model file
COPY ./pom.xml ./pom.xml

# fetch all dependencies
RUN mvn dependency:go-offline -B

# copy your other files
COPY ./src ./src

# build for release
# NOTE: my-project-* should be replaced with the proper prefix
RUN mvn package && cp target/my-project-*.jar app.jar


# smaller, final base image
FROM openjdk:8u171-jre-alpine
# OPTIONAL: copy dependencies so the thin jar won't need to re-download them
# COPY --from=maven /root/.m2 /root/.m2

# set deployment directory
WORKDIR /app

# copy over the built artifact from the maven image
COPY --from=maven /app/app.jar ./app.jar

# set the startup command to run your binary
CMD ["java", "-jar", "/app/app.jar"]

Note that it relies on the so-called multi-stage build feature of Docker (presence of two FROM directives), implying the final image will be much smaller than the maven base image itself.
(If you are not interested in that feature during the development phase, you can remove the lines FROM openjdk:8u171-jre-alpine and COPY --from=maven /app/app.jar ./app.jar.)

In this approach, the Maven dependencies are fetched with RUN mvn dependency:go-offline -B before the line COPY ./src ./src (to benefit from Docker's cache).

Note however that the dependency:go-offline standard goal is not "perfect" as a few dynamic dependencies/plugins may still trigger some re-downloading at the mvn package step. If this is an issue for you (e.g. if at some point you'd really want to work offline), you could take at look at that other SO answer that suggests using a dedicated plugin that provides the de.qaware.maven:go-offline-maven-plugin:resolve-dependencies goal.

ErikMD
  • 13,377
  • 3
  • 35
  • 71
  • This is quite interesting, and i'll give it a read for sure but before trying to switch to this solution i would try everything i can to use nothing but a Dockerfile. I wouldn't like to pay the price of another level of complexity at this moment. However this is surely a +1. – L. Don Dec 10 '18 at 13:39
  • @L.Don It depends on what you want to achieve: from your initial question it seems there are 2 orthogonal issues to address: (1) avoid a ~200MB *download* at each `docker build`, and (2) avoid a ~200MB *upload* (caused by the large size of the `.jar`) after `docker build … && docker push`. I mentioned both aspects in my answer (and to address point (2) it seems necessary to tweak the way your `.jar` is built, hence the need for one such Maven plugin). But to address point (1) in a reproducible way, everything can indeed be done at the `Dockerfile` level. I'll edit my answer to expand on this. – ErikMD Dec 10 '18 at 14:15
1

In general Dockerfile container build, works in layers and each time you build these layers are available in catch and is used if there are no changes. Ideally it should have worked same way.

Maven generally looks for dependencies by default in .m2 folder located in Home dir of User in Ubuntu /home/username/

If dependent jars are not available then it downloads those jars to .m2 and uses it.

Now you can zip and copy this .m2 folder after 1 successful build and move it inside Docker Container User's Home directory.

Do this before you run build command

Note: You might need to replace existing .m2 folder in docker

So your Docker file would be something like this

FROM maven:3.6.0-jdk-8-slim

WORKDIR /app

COPY .m2.zip /home/testuser/

ADD pom.xml /app

RUN mvn verify clean --fail-never

COPY ./src /app/src

RUN mvn package
...
MyTwoCents
  • 7,284
  • 3
  • 24
  • 52
  • Thank you for the answer, i'll give it a try and i'll let you know! So basically your approach is to don't give to the JAR any knowledge of any kind of library binding but substitute directly the `.m2` folder in the container? – L. Don Dec 09 '18 at 14:27
  • I didn't try your suggestion regarding a `~/.m2.jar` file so I'm unsure it would work... but is it documented somewhere? I did not find such a mention in https://maven.apache.org/ – ErikMD Dec 09 '18 at 15:16
  • 2
    Sorry for the typo in dockerfile, its .m2.zip. There will be .m2 folder generated where all jars are cached in a way when you run mvn build/mvn install first time. Folder structure goes like this ~/.m2/repository/com/oracle/ojdbc7/12.1.0.1/ojdbc7-12.1.0.1.jar. More detail here: https://www.baeldung.com/maven-local-repository – MyTwoCents Dec 09 '18 at 15:27
  • 1
    Did you unzip . m2 folder inside docker at /home/testuser. Before running build comnand – MyTwoCents Dec 10 '18 at 13:41
  • I deleted the previous comment because i was wrong, now it seems to do what i was asking for, well sort of. @MyTwoCents When I now have the container `.m2` folder populated with all the dependencies, my application can't still find them resulting in a `NoClassDefFoundError`. I'll edit my question for further informations. Thanks again for your help. – L. Don Dec 10 '18 at 14:30
1

The documentation of the official Maven Docker images also points out different ways to achieve better caching of dependencies.

Basically, they recommend to either mount the local maven repository as a volume and use it across Docker images or use a special local repository (/usr/share/maven/ref/) the contents of which will be copied on container startup.

Daniel
  • 321
  • 3
  • 10