I currently have a sequential code with the following parts (all parts are properly encapsulated and isolated in class methods and so on):
- Frame capture from network stream with opencv
VideoCapture
- Processing of the image with yolov7 through pytorch (with cuda)
- Classical processing of the yolov7 output
- Extra heavy classical processing done every X frames (X will be 30 or 60)
Because this will need to run in real-time (some little constant latency is allowed, but not a growing one), something needs to be done since it currently runs at 15 fp (already ignoring odd frames). Time profiling shows that the most time consuming processes are (2) and (4) (no surprise).
When I started looking up info for (1) I learned about the threading
module which seemed promising and popped up in a lot of stackoverflow answers to increase code speed when image capture from cameras was involved. This led me to see salvation in this module (because of missconceptions of parallelization) until I have just learnt that it still executes one thread thanks to the GIL thing. I am also aware of the existance of asyncio
, multithreading
and concurrent.futures.ProcessPoolExecutor
. I have also read this post on threading. Multicore processing is available.
Aim is to have (1) capture frames into a queue
. (2) takes the frame when there's one available and processes it. As soon as it finishes the processing, give the output to (3) and read the next available frame in the queue
to keep processing while (3) and (4) are executed too. (3) takes (2) output and processes it super fast and waits until there's output from (2) and repeat. Finally, every X frames, (4) will read the outputs generated by (3) until that point and perform some heavy calculations.
If I have understood well, I should use multiprocessing instead of multithreading since it's a rather intensive calculation problem (apart from the I/O on (1)).
So the questions really are:
- For (1), since this is I/O related, is
threading
combined withqueue
a good way to go? - For (2), (3) and (4), what is the way to proceed? I need them to run at the same time, especially (2) and (4), since (3) really runs at nearly 300 fps
- For (2), is there a way to process two frames at the same time? For example processing even and odd frames at the same time. This is a critical point right now to get to real-time processing.
- How difficult is all this? I'm not really an expert in this topic (I'm actually a physicist) so I don't know if I'm getting myself in too much of a slippery slope here.
I just need someone who knows about all this mess to point me in the right direction, so don't hesitate to add some references. Thank you very much in advance!