6

I am new to Docker but have had success in Dokcerizing some existing python code using the docker toolbox for windows 10.

Currently i have this setup:

picture of working python code in Docker container

This is done with the Dockerfile:

FROM python:2.7.13
WORKDIR /root
COPY ./requirements.txt /root/requirements.txt
RUN pip install -r requirements.txt
COPY . /root
CMD ["python", "main.py"]

and all my code sits in the container with a bunch of CSV and .pkl files. The thing is that the CSV and .pkl files change daily so after some reading I think i can split these files out into a volume or maybe even a separate container that i can modify and upload everyday without changing the main python script as its 1.4G in size and my upload speed is 40kbps (at best).

Picture of container setup that i would like

So im wondering how would i reference the other container/volume so i could access the CSV and /pkl files in my main body Python code? At the moment everything sits in the same directory so there is no problem i just call the .csv/.pkl name and it works

#open the local .csv file
data = pd.read_csv(csv_select)
#open the local .pkl file
pickled_list = pickle.load(open(can_cat+".pkl","rb"))

How would i reference the above code to open a csv/pkl file from a separate container??

I have read heaps of stackoverflow posts and the docker documentation but can't seem to understand how to make it work, any help would be appreciated.

Michael Dalton
  • 95
  • 1
  • 1
  • 9

1 Answers1

9

Yeah you're on the right track in terms of thinking of using volumes. I would split it up into three bits:

  1. Your python code running in one container
  2. A volume that is shared between your python containers and one or more other containers
  3. A "data copying" container, that on a daily basis copies the latest data to the shared volume.

1. A shared volume

Creating volumes with Docker is easy. What is particularly good is that you can create a volume with a particular name:

docker volume create data-volume

So here we have created the data-volume named volume. You can then mount this onto any container using a command like this:

docker run --rm -v data-volume:/data my-container-image

So here we're running a container from the my-container-image Docker image and mounting the data-volume volume at /data within that container.

Your python code could easy read the files it needs from that directory .e.g /data or you could change the mount-point as required.

2. Copying changed data into the volume

The next step would be to create a simple app that can copy the latest changes into that directory. Again lets say this app copies the latest data into /data on it's own file system. Essentially we want an app that does:

cp $TODAYS_DATA.csv $TODAYS_DATA.pkl /data

We could run this app within a container and also ensure that container has the data-volume mounted at data e.g.:

docker run --rm data-volume:/data my-data-copying-app

This container could be really simple, something like:

FROM alpine:latest
COPY ./todaysdata /todaysdata

You could then run it using the following:

docker run --rm data-volume:/data my-data-copy-image "/bin/sh -c cp -r /todaysdata/* /data/"

So essentially you just run the container with a command to copy the data from today into /data. Because /data is actually a volume, the latest data is then immediately shared with your python app which is exactly what you wanted.

Hope that helps.

Rob Lockwood-Blake
  • 4,688
  • 24
  • 22
  • Thank you Rob! I'm going to try this out as soon as I get some time today and I'll get back to you with how it all went. Thanks for the detailed reply, I didn't think of copying it over into the container volume I was just focusing on how to access the data at another container location. – Michael Dalton Aug 24 '17 at 23:10
  • 1
    Hey Rob, i found a similar way of achieving the same thing you describe when talking about the copy container: docker run -v my-volume:/data --name helper busybox true, docker cp . helper:/data, docker rm helper https://stackoverflow.com/questions/37468788/best-way-to-transfer-data-to-named-volume-of-docker – Michael Dalton Aug 28 '17 at 13:03
  • 1
    @MichaelDalton Yep an alternative to my solution is to use `docker cp`. You could for example create a script that copies your files for today onto the host running your container and then `docker cp` them into the container. Either way would work absolutely fine. – Rob Lockwood-Blake Aug 28 '17 at 20:56