Background:
I read some articles and posts regarding Multithreading in OpenCV:
- On the one hand you can build OpenCV with TBB or OpenMP support which parallelize OpenCV's functions internally.
- On the other hand you can create multiple threads yourself and call the functions parallel to realize multithreading on application level.
But I couldn't get consistent answers which method of multithreading is the right way to go.
Regarding TBB, an answer from 2012 with 5 upvotes:
With WITH_TBB=ON OpenCV tries to use several threads for some functions. The problem is that just a handsome of function are threaded with TBB at the moment (may be a dozen). So, it is hard to see any speedup. OpenCV philosophy here is that application should be multi-threaded, not OpenCV functions.[...]
Regarding multithreading on application level, an comment from an moderator on answers.opencv.org:
please avoid using your own multithreading with opencv. a lot of functions are explicitly not thread-safe. rather rebuild the opencv libs with TBB or openmp support.
But another answer with 3 upvotes is stating:
The library itself is thread safe in that you can have multiple calls into the library at the same time, however the data is not always thread safe.
Problem Description:
So I thought it was at least okay to use (multi)threading on application level. But I encountered strange performance problems when running my program for longer time periods.
After investigating these performance problems I created this minimal, complete, and verifiable example code:
#include "opencv2\opencv.hpp"
#include <vector>
#include <chrono>
#include <thread>
using namespace cv;
using namespace std;
using namespace std::chrono;
void blurSlowdown(void*) {
Mat m1(360, 640, CV_8UC3);
Mat m2(360, 640, CV_8UC3);
medianBlur(m1, m2, 3);
}
int main()
{
for (;;) {
high_resolution_clock::time_point start = high_resolution_clock::now();
for (int k = 0; k < 100; k++) {
thread t(blurSlowdown, nullptr);
t.join(); //INTENTIONALLY PUT HERE READ PROBLEM DESCRIPTION
}
high_resolution_clock::time_point end = high_resolution_clock::now();
cout << duration_cast<microseconds>(end - start).count() << endl;
}
}
Actual Behavior:
If the program is running for an extended period of time the time spans printed by
cout << duration_cast<microseconds>(end - start).count() << endl;
are getting larger and larger.
After running the program for around 10 minutes the printed timespans have doubled, which is not explainable with normal fluctuations.
Expected Behavior:
The behavior of the program I would expect is that the time spans are staying pretty much constant, even tho they might be longer than calling the function directly.
Notes:
When calling the function directly:
[...]
for (int k = 0; k < 100; k++) {
blurSlowdown(nullptr);
}
[...]
The printed time spans are staying constant.
When not calling the cv function:
void blurSlowdown(void*) {
Mat m1(360, 640, CV_8UC3);
Mat m2(360, 640, CV_8UC3);
//medianBlur(m1, m2, 3);
}
The printed time spans are staying constant too. So there must be something wrong when using threading in combination with OpenCV functions.
- I know that the code above does NOT achieve actual multithreading there will only be one thread active at the same time that is calling the
blurSlowdown()
function. - I know that creating threads and and cleaning them up afterwards is not coming free and will be slower than calling the function directly.
- It is NOT about that the code is slow in general. The problem is that the printed time spans are getting longer and longer over time.
- The problem is not related to the
medianBlur()
function since it happens on other with other functions likeerode()
orblur()
too. - The problem was reproduced under Mac under clang++ see comment by @Mark Setchell
- The problem is amplified when using the debug library instead of the release
My testing environment:
- Windows 10 64bit
- MSVC compiler
- Official OpenCV 3.4.2 binaries
My Questions:
- Is it okay to use (multi)threading on application level with OpenCV?
- If yes, why are the time spans printed by my program above GROWING over time?
- If no, why is OpenCV then considered thread safe and please explain how to interpret the statement from Kirill Kornyakov instead
- Is TBB / OpenMP in 2019 now widely supported?
- If yes, what offers better performance, multithreading on application level(if allowed) or TBB / OpenMP?