Questions tagged [tritonserver]

39 questions
7
votes
1 answer

How to use Triton server "ensemble model" with 1:N input/output to create patches from large image?

I am trying to feed a very large image into Triton server. I need to divide the input image into patches and feed the patches one by one into a tensorflow model. The image has a variable size, so the number of patches N is variable for each call. I…
Stiefel
  • 2,677
  • 3
  • 31
  • 42
5
votes
2 answers

Is there a way to get the config.pbtxt file from triton inferencing server

Recently, I have come across a solution of the triton serving config file disable flag "--strict-model-config=false" while running the inferencing server. This would enable to create its own config file while loading the model from the model…
3
votes
2 answers

NVIDIA Triton vs TorchServe for SageMaker Inference

NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each? Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on…
juvchan
  • 6,113
  • 2
  • 22
  • 35
3
votes
0 answers

Cog vs Triton Inference Server

I'm considering Cog and Triton Inference Server for inference in production. Does someone know what is the difference in capabilities as well as in run times between the two, especially on AWS?
2
votes
2 answers

Using String parameter for nvidia triton

I'm trying to deploy a simple model on the Triton Inference Server. It is loaded well but I'm having trouble formatting the input to do a proper inference request. My model has a config.pbtxt set up like this max_batch_size: 1 input: [ { …
Regalia
  • 129
  • 2
  • 10
2
votes
0 answers

nvidia dali video decode from external_source buffer (instead of file)

This article explains how to do image decoding and preprocessing on server side with Dali while using triton-inference-server. I am trying to find something similar for doing video decoding from h.264 encoded bytes array on server side, before the…
dumbPy
  • 1,379
  • 1
  • 6
  • 19
2
votes
0 answers

Use real image data with perf_analyzer - Triton Inference Server

I'm currently trying use perf_analyzer of Nvidia Triton Inference Server with Deep Learning model which take as input a numpy array (which is an image).* I followed the steps to use real data from the documentation but my input are rejected by the…
A.BURIE
  • 31
  • 3
1
vote
1 answer

How to create 4d array with random data using numpy random

My model accepts data in the shape(1, 32, 32, 3), I am looking for a way to pass the data using np.array from numpy. Any help on this will be appreciated please
Mahesh
  • 25
  • 6
1
vote
0 answers

Can I deploy kserve inference service using XGBoost model on kserve-tritonserver?

I want to deploy XGBoost model on kserve. I deployed it on default serving runtime. But I want to try it on kserve-tritonserver. I know kserve told me kserve-tritonserver supports Tensorflow, ONNX, PyTorch, TensorRT. And NVIDIA said triton inference…
1
vote
0 answers

how to work with text input directly in triton server?

examples here (https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/nlp_bert/triton_nlp_bert.ipynb) show , that instead of sending text and tokenizing text in the server, it is done in the client side and tokenized input is…
suwa
  • 23
  • 4
1
vote
1 answer

how to host/invoke multiple models in nvidia triton server for inference?

based on documentation here, https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/multi-model/bert_trition-backend/bert_pytorch_trt_backend_MME.ipynb, I have set up a multi model utilizing gpu instance type and…
1
vote
0 answers

Serve concurrent requests with NVIDIA Triton on a GPU

I currently have a triton server with a python backend that serves a model. The machine I am running the inference on is a g4dn.xlarge machine. The instance count provided for the GPU in the config.pbtxt is varied between 1 to 3. I am using…
Ajayv
  • 374
  • 2
  • 13
1
vote
1 answer

Starting triton inference server docker container on kube cluster

Description Trying to deploy the triton docker image as container on kubernetes cluster Triton Information What version of Triton are you using? -> 22.10 Are you using the Triton container or did you build it yourself? I used the server repo with…
1
vote
0 answers

Triton inference server: Explicit model control

I need a little advice with deploying Triton inference server with explicit model control. From the looks of it, this mode gives the user the most control to which model goes live. But the problem I’m not able to solve is how to load models in case…
1
vote
1 answer

Is it possible to use another model within Nvidia Triton Inference Server model repository with a custom Python model?

I want to use a model in my Triton Inference Server model repository in another custom Python model that I have in the same repository. Is it possible? If yes, how to do that? I guess it could be done with Building Custom Python Backend Stub, but I…
Kıvanç Yüksel
  • 701
  • 7
  • 17
1
2 3