0

I have setup a Sagemaker inference endpoint for processing images. I am sending a json request like this to the endpoint:

data = {
    'inferenceType' : 'SINGLE_INSTANCE',
    'productType' : productType,
    'images': [encoded_image_bytes_as_string],
    'content_type': "application/json",
}
payload = json.dumps(data)

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=payload)

where image is an array of base64 encoded images. The endpoint works fine except when I send large images I exceed the Sagemaker's inference payload size limit of:

Maximum payload size for endpoint invocation 6 MB

what other data formats can I make use of that are smaller than JSON? Is there any possibility of using something like gzip to compress my payload before sending it? I know that Sagemaker asynchronous endpoints and batch transform have higher allowable payload sizes however I require realtime inference. Thanks!

219CID
  • 340
  • 5
  • 15

2 Answers2

1

You're currently sending the image bytes inefficiently as Base64 (which is ~1.3x bigger than just bytes). If you'll send bytes instead of JSON, it will allow you to grow the maximum image from (6/1.13)MB to 6MB. You could also contact AWS support and try to ask increase the maximum payload size.
If you need more than that, then you'll need to write the file to some storage (like S3 or EFS), then send the image ref to the endpoint which will read back the image from that storage. Overall, quite hard to pull off, reliably, end to end, in <500ms.

Gili Nachum
  • 5,288
  • 4
  • 31
  • 33
  • I didn't realize how much more efficient raw bytes were thank you - I will try raw bytes. Do you have evidence to suggest that AWS allows for increases in maximum payload size? I do not see it listed as an allowable service quota increase in the portal. – 219CID Dec 20 '22 at 23:11
  • is using raw bytes the absolute smallest I can go? – 219CID Dec 20 '22 at 23:12
  • 1
    Raw bytes - that's the min, yes. Unless you're not used a compressed image format. Limit increase - Yes. Open a support ticket and in free text explain what you're doing, your latency requirements and ask for to increase the max payload size. – Gili Nachum Dec 22 '22 at 10:01
0

Asynchronous is technically a Real Time hosting option in SageMaker. Depending on the kind of latency requirements you have for your invocations, I would recommend exploring Asynchronous Inference as that is designed for large payloads. I would suggest running some load tests with Asynchronous endpoints.

Kirit Thadaka
  • 429
  • 2
  • 5
  • "This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to 15 minutes), and near real-time latency requirements." - I can't wait for up to 15 minutes the inference needs to be less than 500 ms – 219CID Dec 19 '22 at 18:25