I have a multi-threaded app that uses the GPU, which is inherently single-threaded, and the actual APIs I use, cv::gpu::FAST_GPU
, does crash when I try to use them multi-threaded, so basically I have:
static std::mutex s_FAST_GPU_mutex;
{
std::lock_guard<std::mutex> guard(s_FAST_GPU_mutex);
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
Now, benchmarking the code shows me FAST_GPU()
in isolation is faster than the CPU FAST()
, but in the actual application my other threads spend a lot of time waiting for the lock, so the overall throughput is worse.
Looking through the documentation, and at this answer it seems that this might be possible:
static std::mutex s_FAST_GPU_mutex;
static std::unique_lock<std::mutex> s_FAST_GPU_lock(s_FAST_GPU_mutex, std::defer_lock);
{
// Create an unlocked guard
std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_lock, std::defer_lock);
if (s_FAST_GPU_lock.try_lock())
{
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
else
{
cv::FAST(/*parameters*/);
}
}
However, this will not compile as std::lock_guard
only accepts a std::adopt_lock
. How can I implement this properly?