I am embedding a rather large (multi-GB) database file into an image with a standard Go build. This is a read-only database that we create in a separate process then run on our k8s cluster. The file cannot be on a mounted volume for business reasons that may or may not be valid, but it's currently a constraint I need to respect.
I would like to speed up the docker build by using cache when possible, and avoid a re-push of the database when it has not changed (but code has).
The current build copies the file after the code compile, meaning any change to the code invalidates the layer with the DB and forces a pull, move to context, and push whenever we change code. This makes for a very long build. So I want to add the DB to the image before the build.
The existing build does a standard Go build with one bit I can't easily change:
ENV CGO_ENABLED=1
WORKDIR /src
COPY ./go.mod ./
COPY ./go.sum ./
RUN go mod download
COPY . ./
ARG version="dev"
RUN go install -mod=readonly -ldflags="-s -w" -tags netcgo .
COPY /tmp/my-big-database-file /data
I cannot seem to get the db file where I want in the resulting image.
When I use COPY
I end up with the file in two places.
I want the DB file in the image as a layer before the code as the DB does not change often, but code does, so the layer with the DB is often cached, plus I can can avoid pushing the layer with the DB file if it hasn't changed.
I can't figure out how to achieve this, given some constraints with the current build setup.
The COPY . ./
part is the issue: it copies that my-big-database-file
from /data
to /src/data
along with all the other file, leaving me with 2 copies of the DB.
Is there a way that I can exclude, remove or otherwise end up with a single copy of my-big-database-file
living in the /data
directory?
I have tried:
.dockerignore
which does exactly what one would expect, which is to entirely exclude the file from the image. I need the file in the image; the problem is that I need only one instance (it's a huge file) and it needs to be in a specific location.Remove the
COPY /tmp/my-big-database-file /data/my-big-database-file
- I do end up with just one file in/tmp
, not in the place the code expects.Put the DB in a folder excluded by
.dockerignore
but then it's completely missed.A multi-stage build. Same result as above. Multi-stage with
COPY --from
in the go build stage results in 2 copies of the file.COPY --link
but I don't think that's what it solves.RUN rm /tmp/my-big-database-file
but of course that just adds another layer and doesn't accomplish anything.
It's not immediately practical for me to reorganize the source files (e.g. put them all under /src
in git). There are a number of files at the root that I need for subsequent
phases of the larger build. Similarly, there are many files and directories at the root level, and selectively copying would work, but creates a fragile dependency.