Questions tagged [tritonserver]
39 questions
7
votes
1 answer
How to use Triton server "ensemble model" with 1:N input/output to create patches from large image?
I am trying to feed a very large image into Triton server. I need to divide the input image into patches and feed the patches one by one into a tensorflow model. The image has a variable size, so the number of patches N is variable for each call.
I…

Stiefel
- 2,677
- 3
- 31
- 42
5
votes
2 answers
Is there a way to get the config.pbtxt file from triton inferencing server
Recently, I have come across a solution of the triton serving config file disable flag "--strict-model-config=false" while running the inferencing server. This would enable to create its own config file while loading the model from the model…

Rajesh Somasundaram
- 448
- 1
- 4
- 13
3
votes
2 answers
NVIDIA Triton vs TorchServe for SageMaker Inference
NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each?
Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on…

juvchan
- 6,113
- 2
- 22
- 35
3
votes
0 answers
Cog vs Triton Inference Server
I'm considering Cog and Triton Inference Server for inference in production.
Does someone know what is the difference in capabilities as well as in run times between the two, especially on AWS?

Dolev Shapira
- 133
- 8
2
votes
2 answers
Using String parameter for nvidia triton
I'm trying to deploy a simple model on the Triton Inference Server. It is loaded well but I'm having trouble formatting the input to do a proper inference request.
My model has a config.pbtxt set up like this
max_batch_size: 1
input: [
{
…

Regalia
- 129
- 2
- 10
2
votes
0 answers
nvidia dali video decode from external_source buffer (instead of file)
This article explains how to do image decoding and preprocessing on server side with Dali while using triton-inference-server.
I am trying to find something similar for doing video decoding from h.264 encoded bytes array on server side, before the…

dumbPy
- 1,379
- 1
- 6
- 19
2
votes
0 answers
Use real image data with perf_analyzer - Triton Inference Server
I'm currently trying use perf_analyzer of Nvidia Triton Inference Server with Deep Learning model which take as input a numpy array (which is an image).*
I followed the steps to use real data from the documentation but my input are rejected by the…

A.BURIE
- 31
- 3
1
vote
1 answer
How to create 4d array with random data using numpy random
My model accepts data in the shape(1, 32, 32, 3), I am looking for a way to pass the data using np.array from numpy. Any help on this will be appreciated please

Mahesh
- 25
- 6
1
vote
0 answers
Can I deploy kserve inference service using XGBoost model on kserve-tritonserver?
I want to deploy XGBoost model on kserve.
I deployed it on default serving runtime. But I want to try it on kserve-tritonserver.
I know kserve told me kserve-tritonserver supports Tensorflow, ONNX, PyTorch, TensorRT. And NVIDIA said triton inference…

HoonCheol Shin
- 11
- 2
1
vote
0 answers
how to work with text input directly in triton server?
examples here (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/nlp_bert/triton_nlp_bert.ipynb) show , that instead of sending text and tokenizing text in the server, it is done in the client side and tokenized input is…

suwa
- 23
- 4
1
vote
1 answer
how to host/invoke multiple models in nvidia triton server for inference?
based on documentation here, https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/multi-model/bert_trition-backend/bert_pytorch_trt_backend_MME.ipynb, I have set up a multi model utilizing gpu instance type and…

haju
- 95
- 6
1
vote
0 answers
Serve concurrent requests with NVIDIA Triton on a GPU
I currently have a triton server with a python backend that serves a model. The machine I am running the inference on is a g4dn.xlarge machine. The instance count provided for the GPU in the config.pbtxt is varied between 1 to 3.
I am using…

Ajayv
- 374
- 2
- 13
1
vote
1 answer
Starting triton inference server docker container on kube cluster
Description
Trying to deploy the triton docker image as container on kubernetes cluster
Triton Information
What version of Triton are you using? -> 22.10
Are you using the Triton container or did you build it yourself?
I used the server repo with…

Transwert
- 83
- 1
- 10
1
vote
0 answers
Triton inference server: Explicit model control
I need a little advice with deploying Triton inference server with explicit model control. From the looks of it, this mode gives the user the most control to which model goes live. But the problem I’m not able to solve is how to load models in case…

Buddhi De Seram
- 11
- 1
1
vote
1 answer
Is it possible to use another model within Nvidia Triton Inference Server model repository with a custom Python model?
I want to use a model in my Triton Inference Server model repository in another custom Python model that I have in the same repository. Is it possible? If yes, how to do that?
I guess it could be done with Building Custom Python Backend Stub, but I…

Kıvanç Yüksel
- 701
- 7
- 17