6

I'm trying to build a docker image avoiding unnecessary bulk, and I've run into a problem that I think should be common, but so far I haven't found a straightforward solution. (I'm building the docker on an ubuntu 18.04 system, and starting with a FROM ubuntu layer.)

In particular, I have a very large .deb file (over 3G) that I need to install in the image. It's easy enough to COPY or ADD it and then RUN dpkg -i, but that results in duplication of several GB of space that I don't need. Of course, just removing the file doesn't reduce the image size.

I'd like to be able to mount a volume to access the .deb file, rather than COPY it, which is easy to do when running a container, but apparently not possible to do when building one?

What I've come up with so far is to build the docker up to the point where I would ADD the file, then run it with a volume mounted so I can access it from the container without COPYing it, then I dpkg -i it, then I do a docker commit to create an image from that container. Sure enough, I end up with an image that's over 3GB smaller than my first try, but that seems like a hack, and makes scripting the build more complicated.

I'm thinking there must be a more appropriate way to achieve this, but so far my searching has not revealed an obvious answer. Am I missing something?

OldGeeksGuide
  • 2,888
  • 13
  • 23

1 Answers1

1

Relying on docker commit indeed amounts to a hack :) and its use is thus mentioned as inadvisable by some references such as this blog article.

I only see one possible approach for the kind of use case you mention (copy a one-time .deb package, install it and remove the binary immediately from the image layer):

You could make remotely available to the docker engine that builds your image, the .deb you'd want to install, and replace the COPY + RUN directives with a single one, e.g., relying on curl:

RUN curl -OL https://example.com/foo.deb && dpkg -i foo.deb && rm -f foo.deb

If curl is not yet installed, you could run beforehand the usual APT commands:

RUN apt-get update -y -q \
  && DEBIAN_FRONTEND=noninteractive apt-get install -y -q --no-install-recommends \
    ca-certificates \
    curl \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

Maybe there is another possible solution (but I don't think the multi-staged builds Docker feature would be of some help here, as all perms would be lost by doing e.g. COPY --from=build / /).

ErikMD
  • 13,377
  • 3
  • 35
  • 71
  • 1
    Thanks. That's helpful, but I'm still a little dumbfounded. I would think this is the kind of thing people run into all the time, and given how easy it is to mount a volume during `run`, it seems like an obvious feature to do the same during `build`. Do you have any idea why they don't just enable a `-v` option for build? – OldGeeksGuide Sep 06 '18 at 23:12
  • @OldGeeksGuid indeed it is indeed not possible to do this with Docker, because it would hinder the portability/reproducibility of the build: this is notably explained in https://stackoverflow.com/questions/26050899/how-to-mount-host-volumes-into-docker-containers-in-dockerfile-during-build and https://github.com/moby/moby/issues/3156 – ErikMD Sep 07 '18 at 17:36
  • However as I was mentioning in my answer, there exist some workarounds to reduce the impact on the image size without relying on `-v` + `docker commit` (cf. also https://vsupalov.com/cache-docker-build-dependencies-without-volume-mounting/), but TTBOMK only the "remote availability of your dependency + the use of a single `RUN`" would do the job for your use case… – ErikMD Sep 07 '18 at 17:41
  • 1
    I genuinely don't understand the reproducibility argument. How does it affect reproducibility any more than COPY or ADD? I just want the same thing but with what amounts to a soft link rather than actually COPYing a temporary file and then removing it, leaving behind GBs of unused and unusable cruft in the docker image. Instead I end up running an http.server and connecting to the localhost through a temporarily available URL. In my case, the 'straightforward' way leaves me with a 12GB docker image. I might as well use a VM instead. – OldGeeksGuide Sep 07 '18 at 18:56
  • AFAICT, there is a large difference between using `COPY` for importing a given file, and using the `-v` option: `COPY` will only import files from the build context (the folder containing the Dockerfile), and it will pre-compute a fingerprint of the exact content to include (as a cryptographic hash is computed for each content in the build context), so that Docker will know exactly how to reproduce a `COPY files /path` command: more specifically, it will know if the step can be skipped because the files are exactly the same in the cache. These two aspects are not addressed by the `-v` option. – ErikMD Sep 07 '18 at 19:11
  • 1
    But it seems they could easily be addressed given something like a "LINK" command, i.e. you still have the files in the build context, you just don't actually copy them. – OldGeeksGuide Sep 07 '18 at 19:13
  • Indeed I don't say this is not possible or the feature would not be useful :) just that for the time being the only two ways to achieve your use case in Docker seem to be `docker commit` on the one hand, and `RUN` + curl-or-so on the other hand... – ErikMD Sep 07 '18 at 19:16
  • Thank you, I appreciate your explanations! – OldGeeksGuide Sep 07 '18 at 20:46