My first question is, reading frames from the camera, is that mostly IO intensive or CPU intensive.
While network I/O is somewhat CPU intense, that processing happens in your operating system kernel. To your application it looks like I/O.
Secondly i am wondering if/when I pick the multiprocessing route, how I should implement it.
There is a lot of fine-tuning that you can do, however I would argue that it is important to stick to the KISS principle and tune your application as required.
From what you describe, I expect your application to be mostly in one of three stages:
- Waiting for the next frame or copying it into a numpy array
- Calling multiple C functions that implement your OpenCV processing
- Waiting for the network output to send the frame
If you don't want to lose frames, you should (almost) always have a dedicated thread do step 1. Otherwise GigE vision will happily drop frames that it cannot buffer.
For step 2 you should first check whether your OpenCV processing is parallelized internally or may even use GPU processing. In this case, there is little or no gain in adding multiprocessing or multithreading on top of it.
Step 3 is similar to step 1, just have a thread always ready to send images in order to keep your network layer busy. Since this step involves TCP, using multiple HTTP connections in parallel might be beneficial. Of course this depends on your receiver.
Your biggest concern as far as parallelization goes is the global interpreter lock (GIL). https://wiki.python.org/moin/GlobalInterpreterLock
However, note from that particular site:
In hindsight, the GIL is not ideal, since it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.
In other words, if all you do is calling a few long-running C functions, either your network I/O or your OpenCV processing, you can use multithreading. Otherwise you may need multiprocessing.
Note that multiprocessing incurs more costs to the processing as a whole because you need to copy your frames from one application to another.
In conclusion, here is how I would set it up:
- One thread per camera that does nothing but acquiring images
- One thread or a thread pool doing the image processing (again, if OpenCV parallelizes internally, don't put a thread pool on top of it)
- One thread sending frames to your http server
All stages should be connected with queue.Queues
or queue.SimpleQueues
. These should be size-limited to avoid exhausting memory if processing stalls.
If you find that you actually need multiprocessing, you can replace the queues with multiprocessing.Queues
and all threads with dedicated multiprocessing.Processes
. Or the thread in stage 1 can simply call multiprocessing.Pool.apply_async
for each incoming frame and pass the AsyncResult object via a Queue
to stage 3.