I have setup a Sagemaker inference endpoint for processing images. I am sending a json request like this to the endpoint:
data = {
'inferenceType' : 'SINGLE_INSTANCE',
'productType' : productType,
'images': [encoded_image_bytes_as_string],
'content_type': "application/json",
}
payload = json.dumps(data)
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=payload)
where image
is an array of base64 encoded images. The endpoint works fine except when I send large images I exceed the Sagemaker's inference payload size limit of:
Maximum payload size for endpoint invocation 6 MB
what other data formats can I make use of that are smaller than JSON? Is there any possibility of using something like gzip to compress my payload before sending it? I know that Sagemaker asynchronous endpoints and batch transform have higher allowable payload sizes however I require realtime inference. Thanks!