I have deployed T5 tensorrt model on nvidia triton server and below is the config.pbtxt file, but facing problem while inferencing the model using triton client.
As per the config.pbtxt file there should be 4 inputs to the tensorrt model along with the decoder ids. But how can we send decoder as input to the model I think decoder is to be generated from models output.
name: "tensorrt_model"
platform: "tensorrt_plan"
max_batch_size: 0
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [ -1, -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [-1, -1 ]
},
{
name: "decoder_input_ids"
data_type: TYPE_INT32
dims: [ -1, -1]
},
{
name: "decoder_attention_mask"
data_type: TYPE_INT32
dims: [ -1, -1 ]
}
]
output [
{
name: "last_hidden_state"
data_type: TYPE_FP32
dims: [ -1, -1, 768 ]
},
{
name: "input.151"
data_type: TYPE_FP32
dims: [ -1, -1, -1 ]
}
]
instance_group [
{
count: 1
kind: KIND_GPU
}
]