How send multiple request to AWS Sagemaker Endpoint on single invoke?

Question

I've deployed a Deep learning model on SageMaker endpoint and can request/get answer using sagemaker_client.invoke_endpoint. But each invoke_endpoint accepts single body. How can I send multiple body to get multiple result on single request?

I've tried setting body='{"instances": [myData1, myData2]}' but It recognizes as single string.

def sagemaker_handler(doc):
    data = doc.encode("UTF-8")
    response = sagemaker_client.invoke_endpoint(EndpointName='myEndpoint',
                                                ContentType='application/json',
                                                Accept='application/json', Body=data)
return response

I think you can only send a single body in the invoke_endpoint — David Webster, Jan 17 '19 at 11:14

score 2 · Answer 1 · answered Apr 11 '19 at 23:55

It is not possible to pass multiple requests using the invoke_endpoint at this time. invoke_endpoint only takes one request in the body and returns one prediction. https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html

SageMaker supports batch processing which can be used for multiple requests but this is not through an end point though. https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html

Amazon has documentation for passing multiple requests and formats but this is for batch transformation only. https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html

score 0 · Answer 2 · answered Jan 19 '19 at 15:57

According to the doc, invoke_endpoint() supports multiple instances in the body

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-inference.html

I've used this for built-in algos again and again. You can look at this notebook for an example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.ipynb

I'm wondering whether the Deep Learning containers behave differently. I'll try to find out. Could you tell me a little more about your use case and why you'd like to predict multiple instances at once? Happy to read your feedback and pass it on to the SageMaker team.

Alternatively, if you don't actually need an HTTPS endpoint (i.e. no need for real-time prediction), then batch transform may solve your problem:

https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html

Endpoint need to accept realtime batch request to provide prediction. One by one request is extremely time consuming. `[link] https://github.com/tensorflow/serving/tree/master/tensorflow_serving/servables/tensorflow/testdata/saved_model_counter/00000123` This `*.pb` model accepts list of input data. I've checked so many sources how they created `serving_input_fn` for this one, but found nothing. — gamerrishad, Jan 20 '19 at 15:10

How send multiple request to AWS Sagemaker Endpoint on single invoke?

2 Answers2

Linked