2

We are using AWS Sagemaker feature, bring your own docker, where we have inference model written in R. As I understood, batch transform job runs container in a following way:

docker run image serve

Also, on docker we have a logic to determine which function to invoke:

args <- commandArgs()
if (any(grepl('train', args))) {
    train()}
if (any(grepl('serve', args))) {
    serve()}

Is there a way, to override default container invocation so we can pass some additional parameters?

datahack
  • 477
  • 1
  • 11
  • 32
  • why don't you pass the additional parameters as hyper parameters? – Olivier Cruchant Sep 07 '20 at 07:43
  • Could you please elaborate your response in more details. – datahack Sep 07 '20 at 07:52
  • In the docker container your code has access to a SageMaker-created file named hyperparameter.json (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html). This contains the hyperparameter value you give to the SDK when launching a training job. So you could use that placeholder to pass parameters needed at training time – Olivier Cruchant Sep 07 '20 at 08:56
  • you need to use entrypoint , if I understand what you mean exactly in your Post – LinPy Sep 07 '20 at 08:56
  • So, to provide more information regarding our case. We create Batch Transform Job using CreateTransformJobRequest using Lambda. There we specify model for inference. This way, we have several models pointing out to several different images on ECR, and we just provide model name when creating batch transform job. Idea is to have one sagemaker model, pointing out to one image, that will contain all inference models in container. Then, somehow in runtime to choose which one to trigger. Initial idea is to check if we can pass additional param. @OlivierCruchant I will look into hyperparameter.json. – datahack Sep 07 '20 at 09:03
  • @OlivierCruchant this is for training jobs, but we are running inference. – datahack Sep 07 '20 at 09:06

1 Answers1

3

As you said, and is indicated in the AWS documentation, Sagemaker will run your container with the following command:

docker run image serve

By issuing this command Sagemaker will overwrite any CMD that you provide in your container Dockerfile, so you cannot use CMD to provide dynamic arguments to your program.

We can think in use the Dockerfile ENTRYPOINT to consume some environment variables, but the documentation of AWS dictates that it is preferable use the exec form of the ENTRYPOINT. Somethink like:

ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save"]

I think that, for analogy with model training, they need this kind of container execution to enable the container to receive termination signals:

The exec form of the ENTRYPOINT instruction starts the executable directly, not as a child of /bin/sh. This enables it to receive signals like SIGTERM and SIGKILL from SageMaker APIs.

To allow variable expansion, we need to use the ENTRYPOINT shell form. Imagine:

ENTRYPOINT ["sh", "-c", "/usr/bin/Rscript", "/opt/ml/mars.R", "--no-save", "$ENV_VAR1"]

If you try to do the same with the exec form the variables provided will be treated as a literal and will not be sustituited for their actual values.

Please, see the approved answer of this stackoverflow question for a great explanation of this subject.

But, one thing you can do is obtain the value of these variables in your R code, similar as when you process commandArgs:

ENV_VAR1 <- Sys.getenv("ENV_VAR1")

To pass environment variables to the container, as indicated in the AWS documentation, you can use the CreateModel and CreateTransformJob requests on your container.

You probably will need to include in your Dockerfile ENV definitions for every required environment variable on your container, and provide for these definitions default values with ARG:

ARG ENV_VAR1_DEFAULT_VALUE=VAL1
ENV_VAR1=$ENV_VAR1_DEFAULT_VALUE
jccampanero
  • 50,989
  • 3
  • 20
  • 49
  • At the end I've done this using env variables passed to container, using CreateTransformJobRequest and property Environment. In R script I've read them using following line env_var1 <- Sys.getenv("ENV_VAR1"). This approach worked fine. – datahack Sep 10 '20 at 15:01
  • I am very happy to know that the answer was helpful. – jccampanero Sep 10 '20 at 15:42
  • @jccampanero Do you know of any way to add the `--privileged` flag to the docker run command? – Austin Nov 15 '20 at 19:34
  • Hi Austin. Honestly no, but I'm afraid you can't elevate the container in Sagemaker, for two reasons: first, they run your container and you have no control over it, that's the point of this specific question; and more importantly, they provide a managed environment your container will run in: if they gave you the ability to run your container in privileged mode, unless they remap your container root user - I think they won't o that - you have full control over the host environment in which the container is running with the security implications it could have. – jccampanero Nov 15 '20 at 22:11