I want to preface this by saying this is not a general question about the difference between cmd and entrypoint+cmd. I thought I understood the general difference and how to use them but I encountered possibly a more nuanced issue with entrypoint+cmd.
I was trying to write a simple image (call this image2) that pulls from another image (call this image1) which basically contains my environment. The purpose of this was that the environment is pretty static but I might want to make nuanced changes to the container that runs the code. The image I was having issues with looks like this:
FROM image1
ENTRYPOINT [ "/opt/conda/bin/python" ]
CMD [ "/tmp/script.py" ]
I wanted to write it this way to restrict the purpose of this container (running a python script). This however would throw an error when I ran it outside the container. It would start the script and run for a bit, but when it would get to some Pyspark code it would result in this:
java.io.IOException: Cannot run program "python3": error=2, No such file or directory
Pyspark was suddenly looking to use python3 but I'm not sure why it started looking for that.
However, if I change the Dockerfile to the following:
FROM image1
CMD /opt/conda/bin/python /tmp/script.py
Then it runs fine without error. So I'm wondering if someone can explain why I'm able to do my script with CMD alone but not with ENTRYPOINT.