31

I have a Docker file trying to deploy Django code to a container

FROM ubuntu:latest
MAINTAINER { myname }

#RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sou$

RUN apt-get update

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y tar git curl dialog wget net-tools nano buil$
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python python-dev python-distribute python-p$

RUN mkdir /opt/app
WORKDIR /opt/app

#Pull Code
RUN git clone git@bitbucket.org/{user}/{repo}

RUN pip install -r website/requirements.txt

#EXPOSE = ["8000"]
CMD python website/manage.py runserver 0.0.0.0:8000

And then I build my code as docker build -t dockerhubaccount/demo:v1 ., and this pulls my code from Bitbucket to the container. I run it as docker run -p 8000:8080 -td felixcheruiyot/demo:v1 and things appear to work fine.

Now I want to update the code i.e since I used git clone ..., I have this confusion:

  • How can I update my code when I have new commits and upon Docker containers build it ships with the new code (note: when I run build it does not fetch it because of cache).
  • What is the best workflow for this kind of approach?
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cheruiyot Felix
  • 1,557
  • 5
  • 26
  • 42

5 Answers5

23

There are a couple of approaches you can use.

  1. You can use docker build --no-cache to avoid using the cache of the Git clone.
  2. The startup command calls git pull. So instead of running python manage.py, you'd have something like CMD cd /repo && git pull && python manage.py or use a start script if things are more complex.

I tend to prefer 2. You can also run a cron job to update the code in your container, but that's a little more work and goes somewhat against the Docker philosophy.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
seanmcl
  • 9,740
  • 3
  • 39
  • 45
  • 2
    I will take 2 for answer. I think it makes more sense since running --no-cache will install everything a fresh, not a good approach. Thanks for your contribution. – Cheruiyot Felix Dec 17 '14 at 16:35
  • what if someone will execute `docker exec -it con/tainer bash` - that would run `bash` instead of whatever is specified in `CMD`. Seems to me that both options are quite sub-optimal, and this must be very common problem - I feel there should be a way to disable cache in the middle of the `Dockerfile`. – avloss Oct 11 '16 at 05:52
12

I would recommend you checkout out the code on your host and COPY it into the image. That way it will be updated whenever you make a change. Also, during development you can bind mount the source directory over the code directory in the container, meaning any changes are reflected immediately in the container.

A docker command for git repositories that checks for the last update would be very useful though!

Adrian Mouat
  • 44,585
  • 16
  • 110
  • 102
8

Another solution.

Docker build command uses cache as long as a instruction string is exactly same as the one of cached image. So, if you write

RUN echo '2014122400' >/dev/null && git pull ...

On next update, you change as follows.

RUN echo '2014122501' >/dev/null && git pull ...

This can prevents docker from using cache.

takaomag
  • 1,545
  • 1
  • 16
  • 26
  • I think the best answer is this. With this method, we can turn off the cache only proper place. – makerj Mar 23 '16 at 16:02
  • it should be noted that this will cause everything AFTER this line to be rebuilt. Fine if your git pull is the last line as in the OPs question. Could be bad if your git pull is earlier in a Dockerfile – JHowIX Oct 27 '16 at 21:39
2

I would like to offer another possible solution. I need to warn however that it's definitely not the "docker way" of doing things and relies on the existence of volumes (which could be a potential blocker in tools like Docker Swarm and Kubernetes)

The basic principle that we will be taking advantage of is the fact that the contents of container directories that are used as Docker Volumes, are actually stored in the file system of the host. Check out this part of the documentation.

In your case you would make /opt/app a Docker Volume. You don't need to map the Volume explicitly to a location on the host's file-system since as a I will describe below, the mapping can be obtained dynamically.

So for starters leave your Dockerfile exactly as it is and switch your container creation command to something like:

docker run -p 8000:8080 -v /opt/app --name some-name -td felixcheruiyot/demo:v1

The command docker inspect -f {{index .Volumes "/opt/webapp"}} some-name will print the full file system path on the host where your code is stored (this is where I picked up the inspect trick).

Armed with that knowledge all you have to do is replace that code and your all set. So a very simple deploy script would be something like:

code_path=$(docker inspect -f {{index .Volumes "/opt/webapp"}} some-name)
rm -rfv $code_path/*
cd $code_path
git clone git@bitbucket.org/{user}/{repo}

The benefits you get with an approach like this are:

  • There are no potentially costly cacheless image rebuilds
  • There is no need to move application specific running information into the run command. The Dockerfile is the only source of needed for instrumenting the application

UPDATE

You can achieve the same results I have mentioned above using docker cp (starting Docker 1.8). This way the container need not have volumes, and you can replace code in the container as you would on the host file-system.

Of course as I mentioned in the beginning of the answer, this is not the "docker way" of doing things, which advocates containers being immutable and reproducible.

geoand
  • 60,071
  • 24
  • 172
  • 190
  • just a note ...immutable docker containers? how does this cope with a changing python environment when an application update uses new dependencies? – Yoeri Apr 06 '16 at 12:27
  • @Yoeri The docker way advocates using *new* containers whenever anything changes. That means that when the application get updated in any way (be it application code, dependencies, configuration, etc.) a new container should be created. The goal is to have full reproducibility of the container – geoand Apr 06 '16 at 12:34
  • 1
    so it would be better using a separate container (or host) with sources and environment and sharing those volumes ..I always find question about changing source-code, never about changing dependencies .... – Yoeri Apr 06 '16 at 12:51
  • From a docker-way perspective, a change is a change no matter what kind it is. Your ultimate goal is to always be able to recreate a container, which means that you can't change stuff inside the container – geoand Apr 06 '16 at 13:27
0

If you use GitHub you can use the GitHub API to not cache specific RUN commands.

You need to have jq installed to parse JSON: apt-get install -y jq

Example:

docker build --build-arg SHA=$(curl -s 'https://api.github.com/repos/Tencent/mars/commits' | jq -r '.[0].sha') -t imageName .

In Dockerfile (ARG command should be right before RUN):

ARG SHA=LATEST
RUN SHA=${SHA} \
    git clone https://github.com/Tencent/mars.git

Or if you don't want to install jq:

SHA=$(curl -s 'https://api.github.com/repos/Tencent/mars/commits' | grep sha | head -1)

If a repository has new commits, git clone will be executed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Michal Zmuda
  • 5,381
  • 3
  • 43
  • 39
  • You can also use `ADD` to have the `docker build` download the current HEAD ref from the API every time. Then when the HEAD changes the cache is invalidated. http://stackoverflow.com/questions/36996046/how-to-prevent-dockerfile-caching-git-clone/39278224#39278224 – anq Jan 07 '17 at 02:41