0

I am trying to get inferences from a model that is exposing a gRPC endpoint in a batch setting and a large batch size of about 10GB. I have instantiated a request object which I need to pass this batch data to

request.inputs['images'].CopyFrom(make_tensor_proto(batch, shape=batch.shape)) 

The make_tensor_proto(batch, shape=batch.shape) fails with the error below

ValueError: Cannot create a tensor proto whose content is larger than 2GB

I have understood that protobuf has that limitation by design, as shown in some discussions,e.g 1 and 2.

Most of the examples are based on a model training context, are there any ideas how this limitation can be overcome for inference purposes since gRPC requires protobuf data?

The server has enough memory to process the large batch but the communication protocol is the bottleneck.

Any idea how this can be achieved without breaking the data into smaller chunks on the client side?

qboomerang
  • 1,931
  • 3
  • 15
  • 20

0 Answers0