0

My code is like this:

auto t1 = std::chrono::steady_clock::now();
    for (int t{0}; t < 100; ++t) {
        vector<int> table(256, 0);
        Mat im2 = cv::imread(impth, cv::ImreadModes::IMREAD_COLOR);
        im2.forEach<cv::Vec3b>([&table](cv::Vec3b &pix, const int* pos) {
                for (int i{0}; i < 3; ++i) ++table[pix[i]];
        });
    }
    auto t2 = std::chrono::steady_clock::now();
    cout << "time is: " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << endl;

    auto t3 = std::chrono::steady_clock::now();
    for (int t{0}; t < 100; ++t) {
        vector<int> table(256, 0);
        Mat im2 = cv::imread(impth, cv::ImreadModes::IMREAD_COLOR);
        for (int r{0}; r < im2.rows; ++r) {
            auto ptr = im2.ptr<uint8_t>(r);
            for (int c{0}; c < im2.cols; ++c) {
                for (int i{0}; i < 3; ++i) ++table[ptr[i]];
                ptr += 3;
            }
        }
    }
    auto t4 = std::chrono::steady_clock::now();
    cout << "time is: " << std::chrono::duration_cast<std::chrono::milliseconds>(t4 - t3).count() << endl;

Intuitively, I feel that foreach should work faster since it used multi-thread mechanism to do the work, but the result turns out that the foreach methods took 14759ms while the naive loop method took only 6791ms. What is the cause of this slower foreach method, and how could make it faster ?

coin cheung
  • 949
  • 2
  • 10
  • 25
  • When I tried it, foreach is faster almost 30 milisecond – Yunus Temurlenk Mar 19 '20 at 05:56
  • 1
    Does it foreach one actually work? Looks like you're writing concurrently to `table` with no synchronization (and you'll bounce that vector all over your cores with concurrent writes). – Mat Mar 19 '20 at 05:58
  • 1
    `foreach` use of threads depends on how the opencv library is built. If it is built without support of multithreading, then the benefits of multithreading won't be observed. Have a look at https://stackoverflow.com/questions/47800790/opencv-foreach-function-parallel-access for more information. – Peter Mar 19 '20 at 05:58
  • @Peter I didn't build with TBB, but from my observation, when I use `foreach` to implement similar function such as `LUT`, I can observe from the output of `htop` that multiple cpu cores are used, does this mean that multi-threading is enabled ? – coin cheung Mar 19 '20 at 06:20
  • Are you using optimised code? – Alan Birtles Mar 19 '20 at 07:04
  • @AlanBirtles Yes, I used -O2 option in my CMakeLists.txt: `set(CMAKE_CXX_FLAGS "-std=c++11, -O2")`, will this work? – coin cheung Mar 19 '20 at 07:53

0 Answers0