I am building a web application I'd like to deploy as a Docker container. The application depends on a set of assets stored in a separate Git repository. The reason for using a separate repository is that the history of that repository is much larger than the current checkout and we'd like to have a way to throw away that history without touching the history of the repository containing the source code.
In the example below, containing only the relevant parts, I'm passing the assets repository commit ID into the build process using a file:
FROM something:something
# [install Git and stuff]
COPY ["assets_git_id", "/root/"]
RUN git clone --bare git://lala/assets.git /root/assets.git \
&& mkdir -p /srv/app/assets
&& git --git-dir=/root/assets.git --work-tree=/srv/app/assets checkout $(</root/assets_git_id) .
&& rm -r /root/assets.git
# [set up the rest of the application]
The problem here is that whenever that ID changes, the whole repository is cloned during the build process and most of the data is thrown away.
What is the canonical way reduce the wasted resources in such a case? Ideally I'd like to have access to a directory from inside the container during build whose contents are kept between multiple runs of the same build. The RUN
script could then just update the repository and copy the relevant data from it instead of cloning the whole repository each time.