0

I know python has a GIL, so it can only use one cpu for one processing. But pytorch use multi cpu because it uses C++ for multithreading. I guess it works for inference too. so I guess we can use python multi coroutine to process request and also use multi cpu (for computation) and load model just one time. I think it suitable for low concurrency request scenario, and we don't need libtorch in this case, am I right?

Jackiexiao
  • 832
  • 9
  • 15

1 Answers1

0

try https://github.com/triton-inference-server, this is exactly what I need.

python backend spawns a separate process for each instance

that means using python multiprocessing to deal with request.

if I use just python, though pytorch can utilize multithreads(multi cpus), but it still process request one by one because

Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads

intra or inter threads is for one request but not dealing with several request in the same time.

Jackiexiao
  • 832
  • 9
  • 15