Running Multiple ONNX Model for Inferencing in Parallel in Python

Question

Is there a way to run multiple ONNX models in parallel and use multiple cores available?

Currently, I have trained two ONNX models and want to infer using them. I have used threading from Python but that doesn’t really use multiple cores.

After that I tried multiprocessing but that gives me below error:

can't pickle onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions objects

Please let me know if there is any workaround to this?

You need to post a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) if anyone (not necessarily I) is going to give you a helpful response. — Booboo, Feb 20 '21 at 15:26

Abhijit Manepatil · Answer 1 · 2021-08-25T12:20:43.307

STEP 1: If you running you are running application on GPU following solution will be helpful.

import multiprocessing

CUDA runtime does not support the fork start method so use spawn, like below before you multi process call:

multiprocessing.set_start_method('spawn')

More understanding please ref: https://github.com/microsoft/onnxruntime/issues/7846

STEP2: You need to pass your refereed object in individual process memory or you can use shared memory approach also like below:

 from multiprocessing.managers import BaseManager
 from PythonFile import ClassName

 BaseManager.register('LabelName', ClassName)
 manager = BaseManager()
 manager.start()
 obj = manager.ClassName()

Now you can pass this "obj" to process call as an argument that can be then accessible in all your multiple processes.

Running Multiple ONNX Model for Inferencing in Parallel in Python

1 Answers1