3

I have Windows multi-threaded code that I'm trying to make portable by using the C++11 thread and synchronization classes. How should the main thread wait for a worker thread with a timeout? I tried using a condition variable, but it's possible for the main thread to miss the worker's notification by waiting too late. In Windows I would use an "event object" as shown below:

#include "Windows.h"

HANDLE  hWorkerDoneEvent;   // synchronization event

DWORD WINAPI ThreadFunc(LPVOID lpParameter)
{
    printf("worker: launched; working\n");
    Sleep(1000);    // pretend to work
    printf("worker: signal done\n");
    SetEvent(hWorkerDoneEvent); // signal main thread that we're done working
    printf("worker: exit\n");
    return 0;
}

int _tmain(int argc, _TCHAR* argv[])
{
    hWorkerDoneEvent = CreateEvent(NULL, FALSE, FALSE, NULL);   // create auto-reset event
    if (hWorkerDoneEvent == NULL) {
        printf("main: error creating event\n");
        return 0;
    }
    printf("main: launch worker\n");
    if (CreateThread(NULL, 0, ThreadFunc, NULL, 0, NULL) == NULL) { // create worker thread
        printf("main: error launching thread\n");
        return 0;
    }
    printf("main: delay a bit\n");
    Sleep(2000);    // demonstrate that it's impossible to miss the worker's signal by waiting too late
    printf("main: wait for worker's done signal\n");
    if (WaitForSingleObject(hWorkerDoneEvent, 5000) == WAIT_OBJECT_0) { // wait for worker's signal
        printf("main: worker finished normally\n");
    } else {
        printf("main: worker timeout or error\n");
    }
    return 0;
}

The Windows program's output is as expected:

main: launch worker
main: delay a bit
worker: launched; working
worker: signal done
worker: exit
main: wait for worker's done signal
main: worker finished normally

My attempt to replicate the above using C++11:

#include "thread"
#include "chrono"
#include "condition_variable"

using namespace std;

condition_variable  cvWorkerDone;
mutex mtxWorkerDone;

void ThreadFunc()
{
    printf("worker: launched; working\n");
    this_thread::sleep_for(std::chrono::milliseconds(1000));    // pretend to work
    printf("worker: signal done\n");
    cvWorkerDone.notify_all();  // signal main thread that we're done working
    printf("worker: exit\n");
}

int _tmain(int argc, _TCHAR* argv[])
{
    printf("main: launch worker\n");
    thread worker(ThreadFunc);
    printf("main: delay a bit\n");
    this_thread::sleep_for(std::chrono::milliseconds(2000));    // causes timeout because we waited too late and missed worker's notification
    printf("main: wait for worker's done signal\n");
    unique_lock<mutex> lk(mtxWorkerDone);
    if (cvWorkerDone.wait_for(lk, std::chrono::milliseconds(5000))) {
        printf("main: worker finished normally\n");
    } else {
        printf("main: worker timeout or error\n");
    }
    worker.join();
    return 0;
}

The C+11 program's output; note that the main thread times out even though the worker did its work and signaled:

main: launch worker
worker: launched; working
main: delay a bit
worker: signal done
worker: exit
main: wait for worker's done signal
main: worker timeout or error

In the C++11 code, the main thread incorrectly times out because notification isn't sticky, in other words notify_all only works if a thread is already waiting. I understand that std::binary_semaphore would solve my problem, but that's only available in C++20 which I don't currently have access to. And yes, I could put a short wait at the start of the worker to give the main thread time to get to its wait, but that's gross and race-prone and inefficient.

I also tried having the worker lock a timed_mutex while the main thread does a try_lock_for on that same mutex, as shown below, but this is equally wrong (it deadlocks):

#include "thread"
#include "chrono"
#include "mutex"

using namespace std;

timed_mutex mtxWorkerDone;

void ThreadFunc()
{
    printf("worker: launched\n");
    printf("worker: delay a bit\n");
    this_thread::sleep_for(std::chrono::milliseconds(500)); // fools main thread into locking mutex before we do, causing deadlock
    printf("worker: locking mutex\n");
    unique_lock<timed_mutex> lk(mtxWorkerDone);
    printf("worker: mutex locked; working\n");
    this_thread::sleep_for(std::chrono::milliseconds(1000));    // pretend to work
    printf("worker: exit (releasing lock)\n");
}

int _tmain(int argc, _TCHAR* argv[])
{
    printf("main: launch worker\n");
    thread worker(ThreadFunc);
    printf("main: wait for worker's done signal\n");
    unique_lock<timed_mutex> lk(mtxWorkerDone, defer_lock);
    if (lk.try_lock_for(std::chrono::milliseconds(5000))) {
        printf("main: worker finished normally\n");
    } else {
        printf("main: worker timeout or error\n");
    }
    worker.join();
    return 0;
}

The output; the main thread thinks the worker finished normally, but the worker is actually blocked at the unique_lock and never gets to its work.

main: launch worker
worker: launched
worker: delay a bit
main: wait for worker's done signal
main: worker finished normally
worker: locking mutex

To reiterate, I'm looking for a stdlib solution in C++11 that doesn't depend on sleeping to avoid races. I have often ported firmware from Windows to embedded platforms, and often faced similar issues. Windows is the Cadillac of synchronization APIs, with a tool for every occasion. Even twenty years later I still see posts asking how to replace WaitForMultipleObjects. Anyway I expected this to be a PITA and I'm not disappointed. :)

  • 6
    "*I tried using a condition variable, but it's possible for the main thread to miss the worker's notification by waiting too late.*" That's not how condition variables work. – Nicol Bolas Jan 27 '23 at 17:03
  • @NicolBolas I did not ask how condition variables work. I asked what I should do instead. – victimofleisure Jan 27 '23 at 17:06
  • 2
    What you want *is* a condition variable. You're getting the effect you're getting because you are using them *incorrectly*. The [example code in cppreference shows where you're doing it wrong](https://en.cppreference.com/w/cpp/thread/condition_variable). – Nicol Bolas Jan 27 '23 at 17:07
  • 1
    Example at https://en.cppreference.com/w/cpp/thread/condition_variable is doing almost what you want. – sklott Jan 27 '23 at 17:09
  • @NicolBolas Likely so, but that information by itself is unhelpful. – victimofleisure Jan 27 '23 at 17:10
  • @sklott I studied that example for hours and there's something I don't get: why doesn't the main thread's std::lock_guard lk(m); block if the worker has already done the unique_lock on that same mutex? Seems to me that if the worker locks the mutex first, the main thread never leaves that first scope and the system is deadlocked. Or is there something magical about lock_guard? – victimofleisure Jan 27 '23 at 17:16
  • 3
    `wait()` function of conditional variable unlocks mutex before waiting and re-locks it before return. – sklott Jan 27 '23 at 17:17
  • @sklott OK, that's plenty confusing. But yes, I see it now: "atomically releases the mutex and suspends thread execution until the condition variable is notified, a timeout expires, or a spurious wakeup occurs, then atomically acquires the mutex before returning". It looks like a hot mess. Maybe this was part of the motivation to add binary_semaphore in C++20? – victimofleisure Jan 27 '23 at 17:23
  • 1
    The oddities are simply because that is how the underlying primitives for locking work (you can look at pthread condition variables for example). To handle the spurious wake-up, you generally use `wait_for` or `wait_until`. Note that for many use cases, spurious wake-ups are not a problem and allowing them simplifies the implementation IIRC. – Dean Johnson Jan 27 '23 at 17:30
  • This is *precisely* the canonical use case for condition variables. Just find the simplest example of how to use a condition variable and it will be trivial to adapt to your use case. – David Schwartz Jan 27 '23 at 17:51

1 Answers1

3

I very well may be missing something, and had to adapt your code a bit as I am doing this quickly on lunch, but I believe this is basically what you are looking for and as far as I can tell is the correct way to go about this. Note the addition of a lambda to the wait() call and the lock_guard in ThreadFunc():

#include "thread"
#include "chrono"
#include "condition_variable"

using namespace std;

condition_variable  cvWorkerDone;
mutex mtxWorkerDone;
bool finished = false;

void ThreadFunc()
{
    printf("worker: launched; working\n");
    this_thread::sleep_for(std::chrono::milliseconds(1000));    // pretend to work
    printf("worker: signal done\n");
    std::lock_guard<std::mutex> lk(mtxWorkerDone);
    finished = true;
    cvWorkerDone.notify_all();  // signal main thread that we're done working
    printf("worker: exit\n");
}

int main(int argc, char* argv[])
{
    printf("main: launch worker\n");
    thread worker(ThreadFunc);
    printf("main: delay a bit\n");
    this_thread::sleep_for(std::chrono::milliseconds(2000));    // causes timeout because we waited too late and missed worker's notification
    printf("main: wait for worker's done signal\n");
    {
    unique_lock<mutex> lk(mtxWorkerDone);
    if (cvWorkerDone.wait_for(lk, std::chrono::milliseconds(5000), []{return finished;})) {
        printf("main: worker finished normally\n");
    } else {
        printf("main: worker timeout or error\n");
    }
    }
    worker.join();
    return 0;
}
Douglas B
  • 585
  • 2
  • 13
  • Indeed it works! The race is avoided. I'm still trying to figure out exactly **why** it works, but thank you very much anyway. – victimofleisure Jan 27 '23 at 18:03
  • 2
    @victimofleisure I am far from an expert in this, but I believe a large part of your issue lies in this quote from the documentation (https://en.cppreference.com/w/cpp/thread/condition_variable) "Even if the shared variable is atomic, it must be modified while owning the mutex to correctly publish the modification to the waiting thread." In general, I find the easiest thing to do in these cases is follow their example as closely as possible, which is how I came up with the supplied code. Note also that my answer here could have some slight improvements based on the info on that page – Douglas B Jan 27 '23 at 18:14