5

I have this opencv image processing function being called 4x on 4 diferent Mat objects.

void processBinary(Mat& binaryMat) {
    //image processing
}

I want to multi-thread it so that all 4 method calls complete at the same time, but have the main thread wait until each thread is done.

Ex:

int main() {

    Mat m1, m2, m3, m4;

    //perform each of these methods simultaneously, but have main thread wait for all processBinary() calls to finish
    processBinary(m1);
    processBinary(m2);
    processBinary(m3);
    processsBinary(m4);
}

What I hope to accomplish is to be able to call processBinary() as many times as I need and have the same efficiency as having the method called only once. I have looked up multithreading, but am a little confused on calling threads and then joining / detaching them. I believe I need to instantiate each thread and then call join() on each thread so that the main thread waits for each to execute, but there doesn't seem to be a significant increase in execution time. Can anyone explain how I should go about multi-threading my program? Thanks!

EDIT: What I have tried:

//this does not significantly increase execution time. However, calling processBinary() only once does.4

    thread p1(&Detector::processBinary, *this, std::ref(m1));
    thread p2(&Detector::processBinary, *this, std::ref(m2));
    thread p3(&Detector::processBinary, *this, std::ref(m3));
    thread p4(&Detector::processBinary, *this, std::ref(m4));
    p1.join();
    p2.join();
    p3.join();
    p4.join();
Sumeet Batra
  • 357
  • 1
  • 4
  • 14
  • 1
    The work you have described is a pipeline. Each function is taking the output of the previous stage. To achieve parallelism you will need to be able to move smaller pieces of work between stages of your workflow. Said differently, how can they do work in parallel if they require output from another task. Copy.. Findcontours... Draw contours.... – David Thomas Jul 26 '16 at 16:28
  • 1
    I'm not sure I understand. I want to put processBinary() in parallel because it is being called 4x , not the code inside the method. Each processBinary() is called on a different Mat object, so they do not depend on each other. – Sumeet Batra Jul 26 '16 at 16:42
  • 1
    Ah... You should add the code that calls `processBinary`. We don't need the internals of processBinary to help you parallelize the calling of it. – David Thomas Jul 26 '16 at 16:44
  • Sorry about my poor explanation; I updated the question so it makes more sense :) – Sumeet Batra Jul 26 '16 at 16:45
  • What have you tried? If `processBinary` is a pure function, you can just spawn 4 standard threads and join them. – E_net4 Jul 26 '16 at 16:51
  • I have tried exactly that. First thread each processBinary() and then join() each thread. However, there is no increase in performance. I have tried just calling processBinary() once on one Mat object and that is much faster. – Sumeet Batra Jul 26 '16 at 17:00

2 Answers2

6

The slick way to achieve this is not to do the thread housekeeping yourself but use a library that provides micro-parallelization.

OpenCV itself uses Intel Thread Building Blocks (TBB) for exactly this task -- running loops in parallel.

In your case, your loop has just four iterations. With C++11, you can write it down very easily using a lambda expression. In your example:

std::vector<cv::Mat> input = { m1, m2, m3, m4; }
tbb::parallel_for(size_t(0), input.size(), size_t(1), [=](size_t i) {
    processBinary(input[i]); 
});

For this example I took code from here.

ypnos
  • 50,202
  • 14
  • 95
  • 141
  • 1
    I will give this a try. I am actually trying to compile this on android through the ndk, and it doesn't seem to recognize include. I have checked the sdk, libtbb.a is in there... Perhaps my Android.mk is not set up correct. Would you happen to know anything about this? Might warrant another question. – Sumeet Batra Jul 26 '16 at 17:20
  • No idea about that. – ypnos Jul 26 '16 at 18:10
  • @SumeetBatra Intel tbb is a separate library. Opencv only uses it, if you build it with tbb support. Else it uses whatever threading library you compiled it with. This can be pthreads, gdb etc. – nnrales Feb 07 '18 at 23:59
  • @nnrales That's not utterly correct. TBB serves a different purpose than pthreads. And what kind of threading library is gdb? – ypnos Feb 08 '18 at 09:23
  • @ypnos . Oh gcd. Hmm.. I though tbb helped with creating a threadpool and distributing work . pthreads can also be used for this – nnrales Feb 08 '18 at 16:35
  • 1
    @nnrales TBB and GCD are one layer above pthreads. TBB actually uses pthreads itself where applicable. If you don't have TBB or GCD you cannot expect the same parallelization from OpenCV. – ypnos Feb 08 '18 at 17:50
0

In case, you're using python language, then you can use my powerful open-source built-in multi-threaded vidgear OpenCV's wrapper python library available on GitHub and PyPI for achieving higher FPS.

Project Insight:

VidGear is a lightweight python wrapper around OpenCV Video I/O module that contains powerful multi-thread modules(gears) to enable high-speed video frames capture functionality across various devices and platforms.

Features:

Key features which differentiate it from the other existing multi-threaded open source solutions are:

  • Multi-Threaded high-speed OpenCV video-frame capturing(resulting in High FPS)

  • Flexible Direct control over the video stream with easy manipulation ability

  • Lightweight

  • Built-in Robust Error and frame synchronization Handling

  • Multi-Platform compatibility(Compatible with Raspberry Pi Camera also.)

  • Full Support for Network Video Streams(Including Gstreamer Raw Video Capture Pipeline)

abhiTronix
  • 1,248
  • 13
  • 17