0

I need to run an activity every so often while my program is running. In production code this is configurable with a default of 30 minutes, but in the example below I've used 5 seconds. Previously I had a std::thread that would loop once per second checking to see if it was time to run the activity OR if the program was closed. This allowed me to close the program at any time without having the .join() on the activity's thread block my application's exit waiting for its next iteration. At any moment it was less than a second away from checking to see if it should close or perform the activity.

I do not like the idea of wasting time checking every second for an activity that may only occur every 30 minutes while the program is running, so I attempted to switch it to a condition variable. I've included a small example of my implementation below. I want to be sure I'm using the right tools to do this. The issue I see with my code is unnecessary calls of the lambda expression which I'll explain below.

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

bool asking_thread_to_quit;
std::mutex cv_mutex;
std::condition_variable cv;

void RunThread()
{
    {
        std::lock_guard<std::mutex> lock(cv_mutex);
        asking_thread_to_quit = false;
    }

    std::cout << "Started RunThread." << std::endl;
    while(true)
    {
        {
            std::unique_lock<std::mutex> lock(cv_mutex);
            std::chrono::seconds delay(5);
            if(cv.wait_for(lock, delay, [] { std::cout << "WAKEUP" << std::endl; return asking_thread_to_quit; })) // timed out
            {
                std::cout << "Breaking RunThread Loop." << std::endl;
                break;
            }
        }

        std::cout << "TIMER CODE!" << std::endl;
    }
}

int main(int argc, char *argv[])
{
    std::cout << "Program Started" << std::endl;
    std::thread run_thread(RunThread);

    // This is where the rest of the program would be implemented, but for the sake of this example, simply wait for user input to allow the thread to run in the background:
    char test;
    std::cin >> test;

    {
        std::lock_guard<std::mutex> lock(cv_mutex);
        asking_thread_to_quit = true;
    }
    cv.notify_all();

    std::cout << "Joining RunThread..." << std::endl;
    run_thread.join();
    std::cout << "RunThread Joined." << std::endl;

    return 0;
}

If you execute the program and allow for one 5-second iteration to pass, it gives the following output:

Program Started
Started RunThread.
WAKEUP
WAKEUP
TIMER CODE!
WAKEUP
q    <-- I typed this to quit.
Joining RunThread...
WAKEUP
Breaking RunThread Loop.
RunThread Joined.

You can see that it does the following:

  1. (WAKEUP) Performs the check prior to waiting

  2. Wait for five seconds

  3. (WAKEUP) Performs the check

  4. (TIMER CODE!) Executes the activity
  5. (WAKEUP) Performs the check again before going back to waiting

Step 5 seems unnecessary as I just performed it a split second ago, but I believe it is necessary as .wait_for() doesn't know I'm using it inside of a while(true) loop. Is this something I'm stuck with, or is there a way to remove the initial check in the .wait_for() call? I'm guessing there is not as it would allow for the system to .wait_for() something that it doesn't need to wait for. This is what leads me to wonder if I'm using the right language features to begin with. Is there a better way?

The Answer

The answer given below goes into detail on other issues with my code as well as sparked an informative related conversation. I'm going to accept that answer as it helped me the most, but the quick answer to the question seems to be this:

asking_thread_to_quit could have been set to true during the TIMER CODE! section, requiring another check prior to waiting on the condition variable again.

David Vereb
  • 1,756
  • 2
  • 12
  • 13
  • Step 1 and 5 are the same in the sense that's it's the check that's run before the thread starts waiting. You are right that removing that check could make the thread wait even though `asking_thread_to_quit` is `true`. – super Sep 28 '18 at 18:29
  • Do you know if there is a better language feature for handling my scenario? Or will I simply have to ignore the fact that the program is performing this "extra" check each time? – David Vereb Sep 28 '18 at 18:45
  • 1
    I don't really understand how it's an **extra** check. `asking_thread_to_quit` could change while `TIMER CODE!` runs so it has to be checked both before and after. I don't know of a general better way. Maybe for a specific scenario one could omit it, but even then I don't believe that would have a noticable impact on performance in most real-world scenarios. – super Sep 28 '18 at 18:54
  • Another thing to note is that you are changing `asking_thread_to_quit` in `main` without protecting it with a mutex which is not thread safe. – super Sep 28 '18 at 18:58
  • So if I set asking_thread_to_quit to true and run `notify_all()` from the main thread during `TIMER CODE!`, it's possible for the notification to be missed and that's why it has to check again when entering `wait_for()`? Or am I misunderstanding how these condition variables work? – David Vereb Sep 28 '18 at 19:03
  • 1
    Yes, it would be possible that the notification would be missed if the initial check was not performed. – super Sep 28 '18 at 19:05
  • Thanks @super. As for `asking_thread_to_quit` without a mutex, it seems [std::atomic might be able to handle it as well](https://stackoverflow.com/a/9201088/1364178). I always assumed a boolean was true or false without the opportunity for confusion between threads, so now I'm a bit worried about how much of my code made this assumption elsewhere! Thank you for pointing it out. – David Vereb Sep 28 '18 at 19:19
  • Use of `wait_until` instead of `wait_for` can help keep your timing on a more regular schedule. With `wait_for` you'll drift off schedule by however much time processing the periodic tasks takes on each iteration. – Howard Hinnant Sep 28 '18 at 20:00

1 Answers1

0

Your code has a few issues with it.

void RunThread()
{
  asking_thread_to_quit = false;

This is a race condition. You shouldn't modify a non-atomic shared variable in two different threads without synchronization.

  std::cout << "Started RunThread." << std::endl;
  while(true)
  {
    std::unique_lock<std::mutex> lock(cv_mutex);
    std::chrono::seconds delay(5);

First using namespace std::literals::chrono_literals;. Then use 5s.

    if(cv.wait_for(lock, delay, [] { std::cout << "WAKEUP" << std::endl; return asking_thread_to_quit; })) // timed out
    {
      std::cout << "Breaking RunThread Loop." << std::endl;
      break;
    }
    else
    {
      std::cout << "TIMER CODE!" << std::endl;
    }

the TIMER CODE usually shouldn't run within the std::mutex lock, as that means anyone sending a message is blocked until the timer code is finished.

  }
}

Finally, WAKEUPs are spurious details. You could WAKEUP 50 times in that 5 seconds; condition variables do not guarantee a bounded number of checks.

asking_thread_to_quit = true;
cv.notify_all();

this again results in a race condition; your program does undefined behavior twice over now.

Changing asking_thread_to_quit to a std::atomic<bool> will get rid of the formal race condition and UB. It will, however, let your code miss a request to quit and mistakenly do another 5 second sleep followed by the task.

This is because the return value of your lambda could be calculated, then the asking_thread_to_quit=true and notify_all evaluates with nothing waiting on the condition variable (so nothing is woken up), then the condition variable is blocked on, 5 seconds pass, it wakes up returning false, then repeats the while loop.

With the mutex being held in all writes to the bool, the write cannot occur until after the lambda has returned and we are waiting on the condition with an unlocked mutex. This prevents the .notify_all() from being missed.

The cargo-cult solution to this is to always guard all reads and writes to asking_thread_to_quit by the cv_mutex. Then avoid holding the cv_mutex for any length of time, including while handling the timer wakeup.

std::unique_lock<std::mutex> lock_cv() {
  return std::unique_lock<std::mutex>(cv_mutex);
}
void RunThread()
{
  {
    auto lock = lock_cv();
    asking_thread_to_quit = false;
  }

  std::cout << "Started RunThread." << std::endl;
  while(true)
  {
    {
      auto lock = lock_cv();
      using namespace std::literals::chrono_literals;
      if(cv.wait_for(lock, 5s, [] { std::cout << "WAKEUP" << std::endl; return asking_thread_to_quit; })) // timed out
      {
        std::cout << "Breaking RunThread Loop." << std::endl;
        break;
      }
    }         
    std::cout << "TIMER CODE!" << std::endl;
  }
}

and in main:

{
  auto lock = lock_cv();
  asking_thread_to_quit = true;
}
cv.notify_all();

And yes, I intended for cv.notify_all() to be outside the mutex. It works; understanding why is outside the scope of the "cargo-cult" solution I'm providing here.

Finally, the WAKEUP is not spurious. The asking_thread_to_quit could have changed since the last time it was checked. Running the lambda guarantees we should fall asleep in a careful manner, with no gap between unlocking the mutex for waiting and waiting for notifications.

Spurious WAKEUPs can still occur; they would show up as more WAKEUPs than you expect.

David Vereb
  • 1,756
  • 2
  • 12
  • 13
Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • I considered `std::chrono::literals`, but it seems it's C++14 and up only. I'll update my post with your recommendations and perhaps you can have another look? To match style elsewhere in the codebase I'd do things a bit differently than you, but I want to be sure I'm still fixing my code at the same time. (It'll be a few minutes) – David Vereb Sep 28 '18 at 19:32
  • Also, if I were to have put `cv.notify_all()` within the scope block containing the mutex lock above it, would that result in a deadlock? Or does `notify_all()` return immediately and would the lock above it be given up? – David Vereb Sep 28 '18 at 19:38
  • @DavidVereb `std::literals::chrono_literals` -- I had wrong name. `notify_all()` within the lock is harmless, but in some implementations of C++ threading it could be less efficient (the waiting thread wakes up, blocks immediately on the mutex, main thread releases mutex, waiting thread wakes up again) – Yakk - Adam Nevraumont Sep 28 '18 at 19:39
  • `std::literals::chrono_literals`, yes. I simply copied what you had written in response, but I knew what you were referencing. Still, it seems to be a C++14 feature, which is why I elected to skip it in my example. The codebase I'm on is using C++11. – David Vereb Sep 28 '18 at 19:41
  • 1
    @DavidVereb Huh, I missed the fact that they added literals and chrono in C++11, but didn't add actual standard literals until C++14. Slackers. – Yakk - Adam Nevraumont Sep 28 '18 at 19:43
  • Since the original question does not pertain to the mutex locks around the boolean value and I've edited the original post to fix it, I believe it might be clearer for anyone else who stumbles upon this question in the future to see your response without the bit about the undefined behavior. Would it be wrong of me to ask you to clean that bit up before I confirm it as an accepted answer? Or would you be okay with me editing it out & then confirming it? I appreciate your assistance with the fixes! – David Vereb Sep 28 '18 at 19:52
  • 1
    @Yakk-AdamNevraumont In most implementations of C++ threading, it's more efficient to notify while holding the lock. The implementation generally needs to check whether or not you're holding the lock and, if you are, it knows that you can't possibly wake up another thread by notifying (since it would just block on the mutex), saving effort. If you signal after you unlock, both the signal and the unlock could wake other threads, requiring an extra trip through the wake code. Changing another thread from "blocked on the cv" to "blocked on the mutex" while already holding the mutex is cheap. – David Schwartz Sep 28 '18 at 19:53
  • @DavidSchwartz So, whenever you release a mutex that a cv is waiting on, the cv gets a spurious wakeup (that may or may not reach the lambda)? – Yakk - Adam Nevraumont Sep 28 '18 at 19:59
  • @DavidSchwartz It seems to me that if the mutex is released after the notify, the notified thread could wake up before the mutex has been unlocked and have to go back to sleep waiting for the mutex. – super Sep 28 '18 at 20:05
  • 2
    @super Any notified thread can't wake up because it can't resume until it holds the mutex and this thread holds the mutex. An optimized implementation will "morph" the thread from waiting for the cv to the mutex. This is cheap to do because the mutex is already held by the calling thread. You shouldn't build your code around the assumption that your cv/mutex implementation is unoptimized. Inherently, signalling while holding the mutex is cheaper because in an optimized implementation, it can't wake any thread. See [here](https://github.com/cloudius-systems/osv/issues/24) and the linked code. – David Schwartz Sep 28 '18 at 20:08