pytorch can multithread so why not use python for inference?

Question

I know python has a GIL, so it can only use one cpu for one processing. But pytorch use multi cpu because it uses C++ for multithreading. I guess it works for inference too. so I guess we can use python multi coroutine to process request and also use multi cpu (for computation) and load model just one time. I think it suitable for low concurrency request scenario, and we don't need libtorch in this case, am I right?

Jackiexiao · Answer 1 · 2022-05-20T07:48:00.070

try https://github.com/triton-inference-server, this is exactly what I need.

python backend spawns a separate process for each instance

that means using python multiprocessing to deal with request.

if I use just python, though pytorch can utilize multithreads(multi cpus), but it still process request one by one because

Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads

intra or inter threads is for one request but not dealing with several request in the same time.

pytorch can multithread so why not use python for inference?

1 Answers1