tl;dr: how do I do file I/O + argument-passing with docker? Or should I give up on trying to use containers like scripts?
I am trying to learn docker and I am having a hard time with some minimal examples of common I/O and argument-passing situations. I have looked through a lot of StackOverflow content such as here as well as Docker documentation, but it seems like this is so simple-minded that no one bothered to answer it. The closest is here, but the answers are not helpful and mostly seem to say "don't do this with Docker". But people seem to talk about containers as if they can do this kind of thing in standalone applications.
Briefly, it seems like in Docker all I/O paths need to be hard-coded, but I want to be able to have these paths be flexible because I want to use the container as flexibly as I could a script.
In some cases people have solved this by leaving the container idling and then passing arguments into it (e.g. here or here) but that seems rather convoluted for a simple purpose.
I am not looking for a way to do this using venvs/conda whatever, I would like to see if it is possible using Docker.
Say I have a simple Python script called test.py
:
#!/usr/bin/env python3
import argparse
def parse_args():
'''Parse CLI arguments
Returns:
dict: CLI arguments
'''
parser = argparse.ArgumentParser(description='Parse arguments for test')
parser.add_argument('--out_file', '-o', required=True, type=str, help='output file')
parser.add_argument('--in_file', '-i', required=True, type=str, help='input file')
args = parser.parse_args()
return vars(args)
args = parse_args()
with open(args["in_file"]) as input_handle:
print(input_handle.readline())
with open(args["out_file"], "w") as output_handle:
output_handle.write("i wrote to a file")
Which natively in Python I can run on some input files:
% cat ../input.txt
i am an input file
% python test.py -i ../input.txt -o output.txt
i am an input file
% cat output.txt
i wrote to a file%
Let's say that for whatever reason this script needs to be dockerized while preserving the way arguments/files are passed so that people can run it without docker. I can write a very simple-minded Dockerfile:
FROM continuumio/miniconda3
COPY . .
ENTRYPOINT ["python", "test.py"]
and this will accept the arguments, but it can't access the input file, and even if it finishes, then I can't access the output:
% docker build .
Sending build context to Docker daemon 5.632kB
Step 1/3 : FROM continuumio/miniconda3
---> 52daacd3dd5d
Step 2/3 : COPY . .
---> 2e8f439e6766
Step 3/3 : ENTRYPOINT ["python", "test.py"]
---> Running in 788c40568687
Removing intermediate container 788c40568687
---> 15e93a7e47ed
Successfully built 15e93a7e47ed
% docker run 15e93a7e47ed -i ../input.txt -o output.txt
Traceback (most recent call last):
File "test.py", line 19, in <module>
with open(args["in_file"]) as input_handle:
FileNotFoundError: [Errno 2] No such file or directory: '../input.txt'
I can then attempt to mount input file's directory using the /inputs/
volume, which gets me most of the way there (though it's irritating to pass 2 arguments for 1 file), but this doesn't seem to work:
docker run --volume /path/to/input_dir/:/inputs 15e93a7e47ed -i input.txt -o output.txt
Traceback (most recent call last):
File "test.py", line 19, in <module>
with open(args["in_file"]) as input_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'input.txt'
I am clearly not understanding something about how volumes are mounted here (probably setting WORKDIR
would do a lot of this work), but even if I can mount the volume, it is not at all clear how to get the outputs onto the mounted volume so they can be accessed from outside the container. There are some manual solutions to this using docker cp
but the whole point is to be somewhat automated.
It seems that string manipulation of the ENTRYPOINT
or CMD
within the Dockerfile is not possible. It seems that approaches like this are not feasible:
ENTRYPOINT ["python", "test.py", "-i data/{i_arg}", "-o data/{o_arg}"]
Where I could just write a file to some variable filename on a mounted volume /data/
that I can substitute in at run-time.