1

I want to preface this by saying this is not a general question about the difference between cmd and entrypoint+cmd. I thought I understood the general difference and how to use them but I encountered possibly a more nuanced issue with entrypoint+cmd.

I was trying to write a simple image (call this image2) that pulls from another image (call this image1) which basically contains my environment. The purpose of this was that the environment is pretty static but I might want to make nuanced changes to the container that runs the code. The image I was having issues with looks like this:

FROM image1

ENTRYPOINT [ "/opt/conda/bin/python" ]
CMD [ "/tmp/script.py" ]

I wanted to write it this way to restrict the purpose of this container (running a python script). This however would throw an error when I ran it outside the container. It would start the script and run for a bit, but when it would get to some Pyspark code it would result in this:

java.io.IOException: Cannot run program "python3": error=2, No such file or directory

Pyspark was suddenly looking to use python3 but I'm not sure why it started looking for that.

However, if I change the Dockerfile to the following:

FROM image1

CMD /opt/conda/bin/python /tmp/script.py

Then it runs fine without error. So I'm wondering if someone can explain why I'm able to do my script with CMD alone but not with ENTRYPOINT.

M Z
  • 4,571
  • 2
  • 13
  • 27
Ken Myers
  • 596
  • 4
  • 21
  • 1
    Not an answer to your question, but I don't feel like `ENTRYPOINT ["python"]` really makes sense. The `CMD` can still be any program on the system, but only if it's implemented in Python, and you still need to repeat the script name if you're overriding the command somewhere. I might [make the script executable](https://docs.python.org/3/tutorial/appendix.html#executable-python-scripts) so you don't need to explicitly say `python` at all, at least in the startup sequence. – David Maze Apr 29 '23 at 10:44
  • Can you elaborate on "and you still need to repeat the script name if you're overriding the command somewhere"? I don't think I fully understand this. The idea was I wanted to limit the container to running python scripts. But I'm not very committed to this idea and might go with what you're saying – Ken Myers Apr 29 '23 at 18:42
  • I think I got it working but I have to specify the full path like `docker run ... /tmp/script.py`. if I try to just pass it `script.py` it errors out with `/usr/local/bin/_entrypoint.sh: line 24: exec: script.py: not found`. Do you know if there's a way to just pass it the file name which would be located in /tmp/? – Ken Myers Apr 29 '23 at 19:10
  • I figured it out by adding `/tmp/` to the PATH – Ken Myers Apr 29 '23 at 20:03

1 Answers1

2

Your dockerfile is fine... Though, there is a difference between CMD arg1 arg2 and CMD ["arg1", "arg2"] (with brackets), so that would at least explain some difference.

when it would get to some Pyspark code

You can set ENV PYSPARK_PYTHON=/opt/conda/bin/python to change the interpreter Spark uses.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245