I'm trying to use the following code in my production server(which I want to scale it to more than 500 TPS). I'm facing the following issue when I flood the server with many requests. In atleast 1 request in 1000 requests the channel.close() call is taking 10-10.5 seconds. I'm running the code on Flask server. Currently, for every request I'm creating a channel and closing it. Please help me with this.
channel = grpc.insecure_channel(serving_address)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = 'serving_default'
request.inputs['model_2_input'].CopyFrom(
make_tensor_proto_engine(img_array, dtype=1, shape=[1, 224, 224, 3]))
result = stub.Predict(request, 6.0)
channel.close()