11

I want to optimize my Dockerfile. And I wish to keep cache file in disk. But, I found when I run docker build . It always try to get every file from network.

I wish to share My cached directory during build (eg. /var/cache/yum/x86_64/6). But, it works only on docker run -v ....

Any suggestion?(In this example, only 1 rpm installed, in real case, I require to install hundreds rpms)

My draft Dockerfile

FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server
RUN sed -i -e 's:keepcache=0:keepcache=1:' /etc/yum.conf
VOLUME ["/var/cache/yum/x86_64/6"] 
EXPOSE 22

At second time, I want to build a similar image

FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server vim

I don't want the fetch openssh-server from internat again(It is slow). In my real case, it is not one package, it is about 100 packages.

mc0e
  • 2,699
  • 28
  • 25
Daniel YC Lin
  • 15,050
  • 18
  • 63
  • 96
  • Just to make sure that I understand the question correctly: if you run that Dockerfile twice, the second time will go very fast, right? If it doesn't, you probably have an issue with your Docker setup, because that should work just fine. However, if your issue is about "how to I build multiple Dockerfiles and have a common yum/apt cache", then it's a different story :-) – jpetazzo Feb 26 '14 at 20:41
  • The question as posed is CentOS specific, but for readers using Debian and derivatives, see https://docs.docker.com/examples/apt-cacher-ng/ – mc0e Dec 24 '14 at 14:47
  • You can do a common base image, and use it FROM mybaseimage, but I believe you are speaking about a cache proxy, thats another story (such as an apt-cacher, or aggressive proxy cache), or you could download all packages before and ADD them when building Dockerfile. – Thomas Decaux Feb 02 '17 at 11:18

3 Answers3

12

An update to previous answers, current docker build accepts --build-arg that pass environment variables like http_proxy without saving it in the resulting image.

Example:

# get squid
docker run --name squid -d --restart=always \
  --publish 3128:3128 \
  --volume /var/spool/squid3 \
  sameersbn/squid:3.3.8-11

# optionally in another terminal run tail on logs
docker exec -it squid tail -f /var/log/squid3/access.log

# get squid ip to use in docker build
SQUID_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' squid)

# build your instance
docker build --build-arg http_proxy=http://$SQUID_IP:3128 .
Piotr Czapla
  • 25,734
  • 24
  • 99
  • 122
  • This solution sounds great, but it fails to cache the large tarball (http://master.qt.io/archive/qt/5.12/5.12.0/submodules/qtbase-everywhere-src-5.12.0.tar.xz, 48289352 bytes) that I want it to cache. Every time docker runs `curl`, the squid logs say TCP_MISS/200 and a real download happens. Is there a max file size in squid's configuration that I need to tweak maybe? – David Faure Dec 20 '18 at 13:38
  • 1
    Found it: https://serverfault.com/questions/596890/squid3-not-caching-larger-files explains what to use as config file for squid (with the gotcha that `maximum_object_size` should be before `cache_dir`), and then one can make docker use that local config file by passing `--volume $PWD/squid.conf:/etc/squid3/squid.conf` in the `docker run` command. Strange default values from squid... – David Faure Dec 20 '18 at 17:54
11

Just use an intermediate/base image:

Base Dockerfile, build it with docker build -t custom-base or something:

FROM centos:6.4
RUN yum update -y
RUN yum install -y openssh-server vim
RUN sed -i -e 's:keepcache=0:keepcache=1:' /etc/yum.conf

Application Dockerfile:

FROM custom-base
VOLUME ["/var/cache/yum/x86_64/6"] 
EXPOSE 22
9

You should use a caching proxy (f.e Http Replicator, squid-deb-proxy ...) or apt-cacher-ng for Ubuntu to cache installation packages. I think, you can install this software to the host machine.

EDIT:

Option 1 - caching http proxy - easier method with modified Dockerfile:

> cd ~/your-project
> git clone https://github.com/gertjanvanzwieten/replicator.git
> mkdir cache
> replicator/http-replicator -r ./cache -p 8080 --daemon ./cache/replicator.log  --static   

add to your Dockerfile (before first RUN line):

ENV http_proxy http://172.17.42.1:8080/

You should optionally clear the cache from time to time.

Option 2 - caching transparent proxy, no modification to Dockerfile:

> cd ~/your-project
> curl -o r.zip https://codeload.github.com/zahradil/replicator/zip/transparent-requests
> unzip r.zip
> rm r.zip
> mv replicator-transparent-requests replicator
> mkdir cache
> replicator/http-replicator -r ./cache -p 8080 --daemon ./cache/replicator.log --static

You need to start the replicator as some user (non root!).

Set up the transparent redirect:

> iptables -t nat -A OUTPUT -p tcp -m owner ! --uid-owner <replicator-user> --dport 80 -j REDIRECT --to-port 8080

Disable redirect:

> iptables -t nat -D OUTPUT -p tcp -m owner ! --uid-owner <replicator-user> --dport 80 -j REDIRECT --to-port 8080

This method is the most transparent and general and your Dockerfile does not need to be modified. You should optionally clear the cache from time to time.

Jiri
  • 16,425
  • 6
  • 52
  • 68
  • Thanks, this is what I want, but, because the release image will be used by others. So, setup proxy settings inside docker image seems is not perfect solution. – Daniel YC Lin Mar 02 '14 at 07:19