Do you build that image via a Dockerfile? When you do that take care about your RUN
statements. When you execute multiple RUN
statements for each of those a new image layer is created which remains in the images history and counts on the images total size.
So for instance if one RUN
statement downloads a huge archive file, a next one unpacks that archive, and a following one cleans up that archive the archive and its extracted files remain in the images history:
RUN curl <options> http://example.com/my/big/archive.tar.gz
RUN tar xvzf <options>
RUN <do whatever you need to do with the unpacked files>
RUN rm archive.tar.gz
There are more efficient ways in terms of image size to combine multiple steps in one RUN
statement using the &&
operator. Like:
RUN curl <options> http://example.com/my/big/archive.tar.gz \
&& tar xvzf <options> \
&& <do whatever you need to do with the unpacked files> \
&& rm archive.tar.gz
In that way you can clean up files and folders that you need for the build process but not in the resulting image and keep them out of the images history as well. That is a quite common pattern to keep image sizes small.
But of course you will not have a fine-grained image history which you could make reuse of, then.
Update:
As well as RUN
statements ADD
statements also create new image layers. Whatever you add to an image that way it stays in history and counts on the total image size. You cannot temporarily ADD
things and then remove them so that they do not count on the total size.
Try to ADD
as less as possible to the image. Especially when you work with large files. Are there other ways to temporary get those files within a RUN
statement so that you can do a cleanup during the same RUN
execution? E.g. RUN git clone <your repo> && <do stuff> && rm -rf <clone dir>
?
A good practice would be to only ADD
those things that are meant to stay on the image. Temporary things should be added and cleaned up with a single RUN
statement instead where possible.