0

I have a simple Algorithm, I want to run it fast in parallel. The algo is.

while stream:
    img = read_image()
    pre_process_img = pre_process(img)
    text = ocr(pre_process_img)
    fine_text = post_process(text)

Now I want to explore what are the fastest options I can get using python for multiprocessing the algorithm.

Some of the code is as follows:

def pre_process_img(frame):
    return cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)

def ocr(frame):
    return pytesseractt.image_to_string(frame)

How can I run the given code in parallel/multiple threads/other options, especially the pre-process and ocr part?

I have tried JobLib, but it is for for-loops, and I wasn't sure how to implement it while loop in continuous frames.

I have been seeing people's code, but I am unable to reproduce it for my example.

Edit

We can definitely combine it in a pipeline.

while stream:
    img = read_image()
    results = pipeline(img)

Now I want to execute the pipeline for different frames in multiple processes.

Ahmad Anis
  • 2,322
  • 4
  • 25
  • 54
  • I see two basic options (which could even be combined..) Parallel operation, and pipelining. Knowing what part of your code takes the most time can tell you which will net you the most benefit. If each step takes a relatively similar amount of time, a linear pipeline might allow you to efficiently use as many cores as there are stages in the pipeline. If only one stage is slow (my suspicion), it makes more sense to parallelize only that operation. – Aaron May 12 '22 at 20:07
  • I recognize the issue is slightly different, but I do have an old example of handling continuous frames with opencv between multiple processes using `multiprocessing.shared_memory` as a frame buffer between the processes: https://stackoverflow.com/a/66522825/3220135 – Aaron May 12 '22 at 20:10
  • Yes, we can pipeline it. `frame -> pipeline` in parallel. – Ahmad Anis May 13 '22 at 04:59

0 Answers0