1

I have a multi-threaded app that uses the GPU, which is inherently single-threaded, and the actual APIs I use, cv::gpu::FAST_GPU, does crash when I try to use them multi-threaded, so basically I have:

static std::mutex s_FAST_GPU_mutex;

{
    std::lock_guard<std::mutex> guard(s_FAST_GPU_mutex);
    cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}

Now, benchmarking the code shows me FAST_GPU() in isolation is faster than the CPU FAST(), but in the actual application my other threads spend a lot of time waiting for the lock, so the overall throughput is worse.

Looking through the documentation, and at this answer it seems that this might be possible:

static std::mutex s_FAST_GPU_mutex;
static std::unique_lock<std::mutex> s_FAST_GPU_lock(s_FAST_GPU_mutex, std::defer_lock);

{
    // Create an unlocked guard
    std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_lock, std::defer_lock);
    if (s_FAST_GPU_lock.try_lock())
    {
        cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
    }
    else
    {
        cv::FAST(/*parameters*/);
    }
}

However, this will not compile as std::lock_guard only accepts a std::adopt_lock. How can I implement this properly?

Community
  • 1
  • 1
Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
  • 1
    I'm not sure why you're using a static `unique_lock`, since usually you wouldn't want 2 threads having access to the same `unique_lock`. That said, I would remove the s_FAST_GPU_LOCK, and then use `std::try_to_lock`, and test if it succeeded using `owns_lock`. – Dave S Mar 29 '17 at 02:49
  • @DaveS Ahh, I see - `mutex.try_lock()` then `lock_guard<>(std::adopt_lock)`? Please make this into an answer. – Ken Y-N Mar 29 '17 at 02:53

2 Answers2

9

It is actually unsafe to have a unique_lock accessible from multiple threads at the same time. I'm not familiar with the opencv portion of your question, so this answer is focused on the mutex/lock usage.

static std::mutex s_FAST_GPU_mutex;
{
   // Create a unique lock, attempting to acquire
   std::unique_lock<std::mutex> guard(s_FAST_GPU_mutex, std::try_to_lock);
   if (guard.owns_lock())
   {
       cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
       guard.unlock(); // Or just let it go out of scope later
   }
   else
   {
       cv::FAST(/*parameters*/);
   }
}  

This attempts to acquire the lock, if it succeeds, uses FAST_GPU, and then releases the lock. If the lock was already acquired, then goes down the second branch, invoking FAST

Dave S
  • 20,507
  • 3
  • 48
  • 68
  • That looks nice and more closely follows the pattern of the simple case so should be easier to understand than my suggestions. – Ken Y-N Mar 29 '17 at 03:06
  • Oh well, it turns out that the pure CPU version is faster than trying the above to split the work. It operates as you describe, so its time to break out the GPU profiler to see if something else is hogging the GPU. – Ken Y-N Mar 29 '17 at 04:42
  • 2
    I can't say for certain but splitting the work will almost certainly force a command buffer stall on the GPU any time the cpu touches the data, as it has to reupload that data to work on, this command buffer stall could cause the gpu operation to block inside the lock, depending on implementation. – 1stCLord Apr 03 '17 at 21:52
5

You can use std::lock_guard, if you adopt the mutex in the locked state, like this:

{
    if (s_FAST_GPU_mutex.try_lock())
    {
        std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_mutex, std::adopt_lock);
        cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
    }
    else
    {
        cv::FAST(/*parameters*/);
    }
}
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Victor Dyachenko
  • 1,363
  • 8
  • 18