How to increase AWS Sagemaker invocation time out while waiting for a response

Question

I deployed a large 3D model to aws sagemaker. Inference will take 2 minutes or more. I get the following error while calling the predictor from Python:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again."'

In Cloud Watch I also see some PING time outs while the container is processing:

2020-10-07T16:02:39.718+02:00 2020/10/07 14:02:39 https://forums.aws.amazon.com/ 106#106: *251 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock/ping", host: "model.aws.local:8080"

How do I increase the invocation time out?

Or is there a way to make async invocations to an sagemaker endpoint?

score 7 · Accepted Answer · answered Oct 10 '20 at 12:08

7

It’s currently not possible to increase timeout—this is an open issue in GitHub. Looking through the issue and similar questions on SO, it seems like you may be able to use batch transforms in conjunction with inference.

References

https://stackoverflow.com/a/55642675/806876

Sagemaker Python SDK timeout issue: https://github.com/aws/sagemaker-python-sdk/issues/1119

answered Oct 10 '20 at 12:08

pygeek

7,356
1
20
41

1

For those coming to this answer and looking at batch transforms. Batch transform invocations [must complete in 10 minutes](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html#your-algorithms-batch-code-how-containers-should-respond-to-inferences) – GrantD71 Jan 24 '22 at 23:33

score 0 · Answer 2 · answered Dec 03 '20 at 13:43

0

This timeout is actually specified at server side - endpoint to be specific. You can try the way of bring your own container also known as BYOC, this way you get full control of everything on endpoint side including the timeout.

You can also reference the endpoint part of this repo which is from one of my colleague - https://github.com/jackie930/yolov4-SageMaker

The timeout you should change exists in serve.py model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)

answered Dec 03 '20 at 13:43

Jim

126
1
5

I already modify the default sagemaker container and change two timeouts. However, this timeout seems to come from outside the container .. – Stiefel Dec 04 '20 at 14:19
1

Yes, On the client side, SageMaker runtime has a 60's timeout as well, and it cannot be changed, so my solution is that inside the endpoint we make the job run in a **separate process** and respond to invocation before the job complete. The result will have to be send back to client when job complete. – Jim Dec 10 '20 at 14:06

How to increase AWS Sagemaker invocation time out while waiting for a response

2 Answers2

References

Linked