How to efficiently input files with docker

Question

I am starting to get a hand on docker and try to containerized some of the applications I use. Thanks to the tutorial I was able to create docker images and containers but now I am trying to thing about the most efficient and practical ways to do things.

To present my use-case, I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.

Normally I would run it through :

python process.py -i path_of_the_input_image -o path_of_the_output_image

Then, the way I do the connection input/output with my docker is the following. First I create the docker file :

FROM python:3.6.8
COPY . /app
WORKDIR /app
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD python ./process.py -i ./input_output/input.jpg -o ./input_output/output.jpg

And then after building the image, I run docker run mapping the a local folder with the input_output folder of docker:

docker run -v C:/local_folder/:/app/input_output my_docker_image

This seems to work, but is not really practical, as I have to create locally a specific folder to mount it to the docker container. So here are the questions I am asking myself :

Is there a more practical ways of doings things ? To directly send one single input file and directly receive one single output files from the output of a docker container ?

When I run the docker image, what happens (If I understand correctly) is that it will create a docker container that will run my program once process.py once and then just sits there doing nothing. Even after finishing running process.py it will still be there listed in the command "docker ps -a". Is this behaviour expected ? Is there a way to automatically delete finished container ? Am I using docker run the right way ?

Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?

You're looking for [`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#entrypoint) — Lescurel, Sep 10 '20 at 11:59
Would it be possible to elaborate a little bit ? I thought Entrypoint was actually very similar to CMD, how would you use it to give input data that will be used by the script within the docker container ? — Stringer Bell, Sep 10 '20 at 13:23
I read your question a bit too fast. I think the answers pointing that you maybe don't need docker are correct. It would probably be preferable to have a different architecture for your script with a server that would accept requests containing images and sending answers containing images. Then you could really take advantage of docker. — Lescurel, Sep 10 '20 at 14:02

score 1 · Answer 1 · answered Sep 10 '20 at 12:09

I have a python code (let's call it process.py) that takes as an input a single .jpg image, does some operations on this image, and then output the processed .jpg image.

That's most efficiently done without Docker; just run the python command you already have. If your application has interesting Python library dependencies, you can install them in a virtual environment to avoid conflicts with the system Python installation.

When I run the Docker image...

...the container runs its main command (docker run command arguments, Dockerfile CMD, possibly combined with an entrypoint from the some sources), and when that command exits, the container exits. It will be listed in docker ps -a output, but as "Stopped" (probably with status 0 for a successful completion). You can docker run --rm to have the container automatically delete itself.

Is there a more practical way of having a container running continuously and on which I can query to run the program process.py on demand with a given input ?

Wrap it in a network service, like a Flask application. As long as this is running, you can use a tool like curl to do an HTTP POST with the input JPEG file as the body, and get the output JPEG file as the response. Avoid using local files and Docker together whenever that's an option (prefer network I/O for process inputs and outputs; prefer a database to local-file storage).

Yes indead, the easiest way is to run the application directly. Idea behind using docker was to share the app as an image/container to deploy it easily. I feel like docker container are more versatile (especially not only support python but all applications) and standard than installing a conda virtual environnement, but I might be mistaken and I am happy to get feedbacks on the pros/cons of different solutions! — Stringer Bell, Sep 10 '20 at 12:41

score 0 · Answer 2 · answered Sep 10 '20 at 12:06

Why are volume mounts not practical?

I would argue that Dockerising your application is not practical, but you've chosen to do so for, presumably very good, reasons. Volume mounts are simply an extension to this. If you want to get data in/out of your container, the 'normal' way to do this is by using volume mounts as you have done. Sure, you could use docker cp to copy the files manually, but that's not really practical either.

As far as the process exiting goes, normally, once the main process exits, the container exits. docker ps -a shows stopped containers as well as running ones. You should see that it says Exited n minutes(hours, days etc) ago. This means that your container has run and exited, correctly. You can remove it with docker rm <containerid>.

docker ps (no -a) will only show the running ones, btw.

If you use the --rm flag in your Docker run command, it will be removed when it exits, so you won't see it in the ps -a output. Stopped containers can be started again, but that's rather unusual.

Another solution might be to change your script to wait for incoming files and process them as they are received. Then you can leave the container running, and it will just process them as needed. If doing this, make sure that your idle loop has a sleep or something in it to ensure that you don't consume too many resources.

The reason behind Dockerising the app is that I wanted an easy way to share it and possibly deploy it to other machines (or the cloud) without having to care about librairies and compatibility issues. I am more than happy to get feedbacks on other ways to do that if you think something more appropriate because I have little to no experience on that topic. Thanks for your answer, only question I would have is whether there is any reason to remove stop container with docker rm or I can just leave them there ? — Stringer Bell, Sep 10 '20 at 12:39
You should remove the old containers as they will accumulate over time and fill the disk. I wasn't criticizing your choice, btw, merely pointing out that using volumes is really an integral part of the choice you've made. — SiHa, Sep 10 '20 at 13:02
Understood, very clear, thanks for your answer. Last question, you suggested to "change the script to way for incoming files". Would you have a recommendation on a nice and easy way to do that with docker/python ? — Stringer Bell, Sep 10 '20 at 13:12
I'm sure you can figure out how to write a while loop. If you get stuck with this aspect of the problem, then you can always post another question, being sure to include your code and how it doesn't work, of course :) — SiHa, Sep 10 '20 at 13:15

How to efficiently input files with docker

2 Answers2

Linked