0

tl;dr: how do I do file I/O + argument-passing with docker? Or should I give up on trying to use containers like scripts?

I am trying to learn docker and I am having a hard time with some minimal examples of common I/O and argument-passing situations. I have looked through a lot of StackOverflow content such as here as well as Docker documentation, but it seems like this is so simple-minded that no one bothered to answer it. The closest is here, but the answers are not helpful and mostly seem to say "don't do this with Docker". But people seem to talk about containers as if they can do this kind of thing in standalone applications.

Briefly, it seems like in Docker all I/O paths need to be hard-coded, but I want to be able to have these paths be flexible because I want to use the container as flexibly as I could a script.

In some cases people have solved this by leaving the container idling and then passing arguments into it (e.g. here or here) but that seems rather convoluted for a simple purpose.

I am not looking for a way to do this using venvs/conda whatever, I would like to see if it is possible using Docker.

Say I have a simple Python script called test.py:

 #!/usr/bin/env python3
 import argparse
 
 def parse_args():
     '''Parse CLI arguments
 
      Returns:
      dict: CLI arguments
     '''
     parser = argparse.ArgumentParser(description='Parse arguments for test')
     parser.add_argument('--out_file', '-o', required=True, type=str, help='output file')
     parser.add_argument('--in_file', '-i', required=True, type=str, help='input file')
     args = parser.parse_args()
     return vars(args)
 
 
 args = parse_args()
 
 with open(args["in_file"]) as input_handle:
     print(input_handle.readline())
 
 with open(args["out_file"], "w") as output_handle:
     output_handle.write("i wrote to a file")
                                                           

Which natively in Python I can run on some input files:

% cat ../input.txt
i am an input file

% python test.py -i ../input.txt -o output.txt
i am an input file

% cat output.txt 
i wrote to a file%             

Let's say that for whatever reason this script needs to be dockerized while preserving the way arguments/files are passed so that people can run it without docker. I can write a very simple-minded Dockerfile:

FROM continuumio/miniconda3

COPY . .

ENTRYPOINT ["python", "test.py"]

and this will accept the arguments, but it can't access the input file, and even if it finishes, then I can't access the output:

% docker build .
Sending build context to Docker daemon  5.632kB
Step 1/3 : FROM continuumio/miniconda3
 ---> 52daacd3dd5d
Step 2/3 : COPY . .
 ---> 2e8f439e6766
Step 3/3 : ENTRYPOINT ["python", "test.py"]
 ---> Running in 788c40568687
Removing intermediate container 788c40568687
 ---> 15e93a7e47ed
Successfully built 15e93a7e47ed 

% docker run 15e93a7e47ed -i ../input.txt -o output.txt
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    with open(args["in_file"]) as input_handle:
FileNotFoundError: [Errno 2] No such file or directory: '../input.txt'

I can then attempt to mount input file's directory using the /inputs/ volume, which gets me most of the way there (though it's irritating to pass 2 arguments for 1 file), but this doesn't seem to work:

docker run --volume /path/to/input_dir/:/inputs 15e93a7e47ed -i input.txt -o output.txt
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    with open(args["in_file"]) as input_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'input.txt'

I am clearly not understanding something about how volumes are mounted here (probably setting WORKDIR would do a lot of this work), but even if I can mount the volume, it is not at all clear how to get the outputs onto the mounted volume so they can be accessed from outside the container. There are some manual solutions to this using docker cp but the whole point is to be somewhat automated.

It seems that string manipulation of the ENTRYPOINT or CMD within the Dockerfile is not possible. It seems that approaches like this are not feasible:

ENTRYPOINT ["python", "test.py", "-i data/{i_arg}", "-o data/{o_arg}"]

Where I could just write a file to some variable filename on a mounted volume /data/ that I can substitute in at run-time.

Maximilian Press
  • 300
  • 4
  • 12

1 Answers1

1

If you really want to run this script in Docker, a minimum set of options that are pretty much always required are:

sudo                        \ # since you can bind-mount an arbitrary host directory
docker run                  \
  --rm                      \ # clean up the container when done
  -it                       \ # some things depend on having a tty as stdout
  -u $(id -u):$(id -g)      \ # use host uid/gid
  -v "$PWD:$PWD"            \ # mount current directory into container
  -w "$PWD"                 \ # set working directory in container
  image-name                \
  -i input.txt -o output.txt  # .. won't work here

As the last comment notes, this makes the current directory accessible to the container on the same path, but if the file you want to access is in the parent directory, it can't reach there.

Fundamentally, a Docker container is intended to be fairly isolated from the host system. A container can't normally access host files, or host devices, or see the host uid-to-name mappings. That isolation leads to many of the things you note: since a container is already isolated, you don't need a virtual environment for additional isolation; since a container is isolated, /input is a much easier directory name to remember than /home/docker/src/my-project/data/input.

Since a container is isolated from the host, any host files that need to be accessed – either inputs or outputs – need to be bind-mounted into the container. In my example I bind-mount the current directory. In your example where you have separate /input and /output container directories, both need to be bind-mounted into the container.

There's not really a way to make this easier and still use Docker; running processes on host data aren't what it's designed for. All of your examples are in Python, and Linux and MacOS systems generally come with Python pre-installed, so you might find it much more straightforward to run the script, possibly in a virtual environment.

python3 -m venv venv       # once only
./venv/bin/pip install .   # once only
./venv/bin/the_script -i ../input.txt output.txt
David Maze
  • 130,717
  • 29
  • 175
  • 215
  • Thanks for the answer. Reading this, it sounds like the answer is still "don't use Docker". I used an intentionally minimal example to try to illustrate the fundamental issue here, possibly that obscured the fact that I ultimately would like to do something more than just run stuff in a venv. I understand the isolation of Docker, I suppose that this kind of very simple composable use-case is just something I assumed was anticipated. I guess that's why people use kubernetes... – Maximilian Press Jan 13 '21 at 18:28
  • 1
    An "operate on local files" use case is basically impossible in Kubernetes. Containerized applications tend to work a little bit better when there isn't local state or filesystems to worry about; if you can put an HTTP facade around your application and POST file content to it, it will be a much easier Docker setup. – David Maze Jan 13 '21 at 21:05