2

I'm having a problem where I'm having a few condition_variable's get stuck in their wait phase even though they've been notified. Each one even has a predicate that's being set just in case they miss the notify call from the main thread.

Here's the code:

unsigned int notifyCount = 10000;
std::atomic<int> threadCompletions = 0;

for (unsigned int i = 0; i < notifyCount; i++)
{
    std::atomic<bool>* wakeUp = new std::atomic<bool>(false);
    std::condition_variable* condition = new std::condition_variable();

    // Worker thread //
    std::thread([&, condition, wakeUp]()
    {
        std::mutex mutex;
        std::unique_lock<std::mutex> lock(mutex);
        condition->wait(lock, [wakeUp] { return wakeUp->load(); });

        threadCompletions++;
    }).detach();

    // Notify //
    *wakeUp = true;
    condition->notify_one();
}

Sleep(5000); // Sleep for 5 seconds just in case some threads are taking a while to finish executing

// Check how many threads finished (threadCompletions should be equal to notifyCount)

Unless I'm mistaken, after the for loop is done, threadCompletions should always be equal to notifyCount. Very often though, it is not.

When running in release, I'll sometimes get just one or two out of 10000 threads that never finished, but when running in debug, I'll get 20 or more.

I thought maybe the wait call in the thread is happening after the main thread's notify_one call (meaning it missed it's notification to wake up), so I passed a predicate into wait to insure that it doesn't get stuck waiting. But it still does in some cases.

Does anyone know why this is happening?

nullptr
  • 97
  • 1
  • 1
  • 7
  • What happens if you put a `this_thread::sleep_for(5s)` right before `*wakeUp = true`? – AndyG Mar 12 '20 at 17:21
  • Your creating 10'000 threads! Creating more than the number of cores on your machine is probably counterproductive (so 4 is usually good number (8 if you have a nice machine)). Create "Work Object" that get executed by the threads (look up thread pool (or `async()` if you can use C++17). – Martin York Mar 12 '20 at 17:33
  • @MartinYork [huh?](https://stackoverflow.com/questions/3126154/multithreading-what-is-the-point-of-more-threads-than-cores) – Fureeish Mar 12 '20 at 17:37
  • @Fureeish Does the answer not generally agree with me? Does it suggest you can get 10'000 threads running efficiently. Also threads are heavy weight and expensive to create. Sure you can get more than the number of cores and get a benefit (but its not simple to determine that balance). So nice rule of thumb for beginners is no more than then the number of cores. – Martin York Mar 12 '20 at 17:49
  • @AndyG It's taking 5 seconds per thread, so it's taking forever, but so far there doesn't seem to be any threads stuck in wait. – nullptr Mar 12 '20 at 17:50
  • @nullptr: The reason I asked is because a thread needs to be *actually waiting* on a condition variable *before* the condition variable is notified. You may have a race condition. Perhaps the thread should check the condition before it waits on the variable. – AndyG Mar 12 '20 at 17:52
  • @MartinYork This is simply a stress test for my condition_variable logic in another project. The code I posted is just an isolation of the problem I'm seeing. Yes, 10,000 threads is overkill, but I don't see any reason why the behavior I'm seeing should be happening at all, regardless of thread count. – nullptr Mar 12 '20 at 17:52
  • @AndyG Yes, I suspect the race condition is a factor here, which is why I'm using a predicate bool check in that code. Would a manual if check do something that the passed in predicate isn't? Edit: I still get the same issue with a manual if check instead of a predicate. – nullptr Mar 12 '20 at 17:55
  • @MartinYork I didn't have any specific answer in mind - I linked a question with valuable discussions. I believe they neither agree nor disagree with you. However, it's just not correct to state that it's counterproductive (or it will always result in poorer performance) to spawn more threads than you have cores. I do agree that it's very hard to find the correct balance, but that's where the testing comes in. – Fureeish Mar 12 '20 at 17:57
  • @nullptr: Same issue if you check the predicate manually and then do the `condition->wait(lock, [wakeUp] { return wakeUp->load(); });` afterwards? – AndyG Mar 12 '20 at 17:57
  • @AndyG Yes, same issue with having both the manual check and the predicate getting passed in at the same time. – nullptr Mar 12 '20 at 18:01
  • @Fureeish I suggest you do more study then. – Martin York Mar 12 '20 at 18:07
  • @MartinYork I believe that working, commercial, large-scale project which uses more threads than the machine it is running on has cores, being extensively tested with various combinations of the number of threads and proving to perform best at current implementation compared to other options is good enough. Can't share more details for legal reasons though. However, it's completely okay if you don't accept this argument due to lack of raw data and sources. – Fureeish Mar 12 '20 at 18:12
  • @Fureeish The C10K problem is what you are looking for. It showed that it is much more efficient to run with few threads rather than thousands. This was a problem we solved at least a decade ago. https://en.wikipedia.org/wiki/C10k_problem – Martin York Mar 12 '20 at 18:21
  • @MartinYork I was never in favour of the thousand-threads approach – Fureeish Mar 12 '20 at 18:27
  • 1
    You are not locking the same mutex in both threads. The mutex doesn't do anything. – user253751 Mar 12 '20 at 18:35

2 Answers2

3

You are assuming the call to wait() is atomic. I don't believe it is. That is why it requires the use of a mutex and a lock.

Consider the following:

Main Thread.                        Child Thread

                                    // This is your wait unrolled.
                                    while (!wakeUp->load()) {    


// This is atomic
// But already checked in the
// thread.
*wakeUp = true;

// Child has not yet called wait
// So this notify_one is wasted.
condition->notify_one();


                                        // The previous call to notify_one
                                        // is not recorded and thus the
                                        // thread is now locked in this wait
                                        // never to be let free.
                                        wait(lock);
                                    }

// Your race condition.

Calls to notify_one() and wait() should be controlled via the same mutext to make sure they don't overlap like this.

for (unsigned int i = 0; i < notifyCount; i++)
{

    std::atomic<bool>* wakeUp = new std::atomic<bool>(false);
    std::mutex*              mutex     = new std::mutex{};
    std::condition_variable* condition = new std::condition_variable();

    // Worker thread //
    std::thread([&]()
    {
        std::unique_lock<std::mutex> lock(*mutex);
        condition->wait(lock, [&wakeUp] { return wakeUp->load(); });

        threadCompletions++;
    }).detach();

    // Notify //
    *wakeUp = true;

    std::unique_lock<std::mutex> lock(*mutex);
    condition->notify_one();
}

// Don't forget to clean up the new structures correctly/.
Martin York
  • 257,169
  • 86
  • 333
  • 562
1

You have data racing. Consider following scenario:

  • Worker Thread: condition variable tests for whether wakeup is true - it isn't

  • Main Thread: wakeup is set to true and condition variable is getting notified

  • Worker Thread: condition_variable triggers wait but it happens after notification already occurred - impling that notification misses and the thread might never wake up.

Normally, synchronization of condition variables is done via mutexes - atomics aren't too helpful here. In C++20 there will be special mechanism for waiting/notifying in atomics.

ALX23z
  • 4,456
  • 1
  • 11
  • 18
  • So what's the solution here? How can I make sure that the worker thread is waiting before I notify, or that wakeUp is set to true before the condition_variable does it's predicate condition check? – nullptr Mar 12 '20 at 18:09
  • Nope as [documentation says](https://en.cppreference.com/w/cpp/thread/condition_variable/wait) if `*wakeUp` is true actual `wait(mutex)` is not invoked simply `condition->wait(lock, [wakeUp] { return wakeUp->load(); });` returns immediately since in your scenario `wakeUp` is `true`. – Marek R Mar 12 '20 at 18:13
  • @nullptr don't make `wakeup` atomic. Instead secure its modification via the same mutex that the condition variable uses. So you have to create it outside of the thread. – ALX23z Mar 12 '20 at 18:13
  • @MarekR I refer to this exact data race wakeup is set to true only after testing but notification happens before locking. It is a rare data race. – ALX23z Mar 12 '20 at 18:15
  • 1
    @MarekR I think he is saying that the predicate check done on wakeUp is happening before the main thread sets it to true, and then the actual wait() call is happening after the main thread calls notify_one. – nullptr Mar 12 '20 at 18:16
  • 1
    @nullptr use the mutex, luke – user253751 Mar 12 '20 at 18:36
  • @nullptr, Re, "So what's the solution...?" If you don't mind reading a tutorial for a different programming language (Java), Here is one that does a good job of explaining the concept of wait() and notify() and how they are supposed to be used: https://docs.oracle.com/javase/tutorial/essential/concurrency/guardmeth.html Note that a java `synchronized(lock)` block works like a C++ unique_lock: It locks the `lock` on entry, and releases it on exit. – Solomon Slow Mar 12 '20 at 20:01