As you said, and is indicated in the AWS documentation, Sagemaker will run your container with the following command:
docker run image serve
By issuing this command Sagemaker will overwrite any CMD
that you provide in your container Dockerfile, so you cannot use CMD
to provide dynamic arguments to your program.
We can think in use the Dockerfile ENTRYPOINT
to consume some environment variables, but the documentation of AWS dictates that it is preferable use the exec
form of the ENTRYPOINT
. Somethink like:
ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save"]
I think that, for analogy with model training, they need this kind of container execution to enable the container to receive termination signals:
The exec form of the ENTRYPOINT
instruction starts the executable directly, not as a child of /bin/sh
. This enables it to receive signals like SIGTERM
and SIGKILL
from SageMaker APIs.
To allow variable expansion, we need to use the ENTRYPOINT
shell
form. Imagine:
ENTRYPOINT ["sh", "-c", "/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save", "$ENV_VAR1"]
If you try to do the same with the exec
form the variables provided will be treated as a literal and will not be sustituited for their actual values.
Please, see the approved answer of this stackoverflow question for a great explanation of this subject.
But, one thing you can do is obtain the value of these variables in your R code, similar as when you process commandArgs
:
ENV_VAR1 <- Sys.getenv("ENV_VAR1")
To pass environment variables to the container, as indicated in the AWS documentation, you can use the CreateModel
and CreateTransformJob
requests on your container.
You probably will need to include in your Dockerfile ENV
definitions for every required environment variable on your container, and provide for these definitions default values with ARG
:
ARG ENV_VAR1_DEFAULT_VALUE=VAL1
ENV_VAR1=$ENV_VAR1_DEFAULT_VALUE