0

I have one GPU. First, let's assume that the img comes from the camera in real time. And I have 2 trained pytorch models doing different tasks. Each model uses img as the same input. Is there a way to ensure that the GPU inference of model 1 and model 2 are processed simultaneously rather than sequentially?

Note: "Not" Processing one model on multiple GPUs in parallel

  • Why don't you execute each one independently when you receive the request/img? You can also [`set_per_process_memory_fraction`](https://stackoverflow.com/a/65557955/7347631). – ndrwnaguib Jul 14 '23 at 02:38
  • @ndrwnaguib It's not a memory problem. I want to do asynchronous GPU inference. Since this work has to be done in real-time without delay, I want two models to perform each task at the same time and process it again using each output, because delay occurs when the models are performed sequentially. – soribido Jul 14 '23 at 02:51

0 Answers0