I know python has a GIL, so it can only use one cpu for one processing. But pytorch use multi cpu because it uses C++ for multithreading. I guess it works for inference too. so I guess we can use python multi coroutine to process request and also use multi cpu (for computation) and load model just one time. I think it suitable for low concurrency request scenario, and we don't need libtorch in this case, am I right?
Asked
Active
Viewed 2,955 times
1 Answers
0
try https://github.com/triton-inference-server, this is exactly what I need.
that means using python multiprocessing to deal with request.
if I use just python, though pytorch can utilize multithreads(multi cpus), but it still process request one by one because
Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads
intra or inter threads is for one request but not dealing with several request in the same time.

Jackiexiao
- 832
- 9
- 15