1

Is there a pattern that allows the usage of std::condition_variable::notify_once/all without holding a mutex? (Note: I'm not talking about the wait part)

Formally, a thread is not required to hold a mutex when calling notify on a condition variable (contrary to wait()), but - to the best of my knowledge - the following is not possible:

void Thread1() {
    unique_lock<mutex> ul(mux);
    while (!newData) { //*) see below
        cv.wait(ul);
    }
    //readData;
}

void Thread2() {
    //write/modify Data;
    newData = true;
    cv.notify_one();
}

The problem is that Thread1 could be interrupted, just when it enters the loop body, after it has checked newData, but before it actually starts to wait.
Now if Thread2 runs to completion between those two statements, the notification will never be registered by Thread1 and Thread1 waits forever.

The simple obvious solution is to put notify into the protected region:

void Thread2() {
    //write/modify Data;
    newData = true;
    {
        lock_guard<mutex> lg(mux);
        cv.notify_one();
    }
}

Why this is necessary by default is further discussed here.
However, the additional overhead bothers me, especially considering that 99% of the time the locking is not necessary. Also, the fact that I need a synchronization mechanism, to protect another synchronization mechanism seems wrong to me.

So I was wondering, whether there is a pattern (e.g. using atomic operations) that makes locking on the notify side unnecessary (only using standard c++14 functions/mechanisms).

Edit: To elaborate a bit more about the background: I have a lockfree circular buffer with non-blocking reads and writes. Now I'd like to add also blocking reads and writes (that wait if the buffer is full/empty), but I'd like to minimize the impact that addition mechanism has on the performance of the non-blocking ones. Obviously, I can not avoid to e.g. call nonEmpty.notify_all() in the writes, but I'd really like to avoid to also lock a mutex in that function.

So far I came up with two Ideas, which unfortunately are less then ideal:

  • Use wait_for. That way, even if the notify is missed, the thread will not deadlock forever, but it introduces sporadic delays.

  • Put a lock around the notify, but make the whole block conditional depending on whether another thread could potentially be in the problematic section (similar to this). This should significantly reduce the number of system calls on the notify siede, but it still requires a mutex sometimes, which might limit its scalability (I havn't done any measurements yet)


*) Btw. I'm aware of the predicated version of wait. Unfortunately - according to cppreference - it is semantically equivalent to my version above and the explicit form makes it easier to reason about the program.

Community
  • 1
  • 1
MikeMB
  • 20,029
  • 9
  • 57
  • 102
  • @Mark B: I don't think, this is a duplicate. I understand, why you need to use a mutex if you apply the standard pattern. I was asking, whether there is a different pattern (e.g. using atomic variables) that avoids locking mutex specifically on the notify side (don't care about the wait side). – MikeMB Sep 15 '15 at 04:23
  • The [write up here](http://stackoverflow.com/questions/17101922/do-i-have-to-aquire-lock-before-calling-condition-variable-notify-one) might be more along the lines that you're looking for. I think there is still a question of whether the modification of `newData` outside of the lock is valid. I claim that it is because the `cv.wait()` call will lock the mutex once it's signaled, but I am not positive. – Kurt Stutsman Sep 15 '15 at 04:46
  • As you say though there is still the race condition on the notify_one()/wait(), and I don't think you can avoid that short of locking. – Kurt Stutsman Sep 15 '15 at 04:55
  • @KurtStutsman: The write is definitvely valid. Even if the notify alone doesn't ensure memory ordering, the atomic flag does. Concerning the locking, I've an Idea how to make the locking at least conditional (based on my answer [here](http://stackoverflow.com/questions/29193445/implement-a-high-performance-mutex-similar-to-qts-one/29194810#29194810)), but that becomes quite ugly, so I was wondering, whether there is a better solution. – MikeMB Sep 15 '15 at 06:02
  • There is no way to avoid the standard pattern using standard C++. The problem is the _wait_. You must not wait if you've already been signalled / the condition has become true, but you can't do this atomically. POSIX could be used (but are quite possibly implemented using a condition variable and mutex anyway). Futexes on linux allow waiting _if_ the futex variable has a certain value (and atomically setting it to another value during the wait), which would probably work for you here. But really, locking and unlocking a mutex shouldn't be an expensive operation. – davmac Sep 21 '15 at 12:23
  • @davmac: I agree, that there is probably no simple/free solution, but as I've mentioned at the end of the question, there are at least two ways to avoid / reduce the locking - both having their own disatvantages. I should probably have a look at the implementation of futexes - maybe this can be efficiently implemented using the standard library. – MikeMB Sep 21 '15 at 13:00
  • @davmac: Concerning the costs, I've to admit that is more of curiosity than actual need, but in my simple benchmark (1 Producer/ 1Consumer threads pounding on a single [lock-free queue](http://stackoverflow.com/questions/32692176/adding-blocking-functions-to-lock-free-queue?)) just the locking and unlocking of SEPARATE mutexes (so no congestion or waiting at all) around a single write to an atomic variable just cuts the throughput IN HALF. Of course the impact on the overall application will be much smaller (at least if you don't try to scale to 8 or 16 cores). – MikeMB Sep 21 '15 at 13:05

0 Answers0