1

I'm still new to multi-threading in C++ and I'm currently trying to wrap my head around "spurious wake-ups" and what's causing them. I've done some digging on condition variables, kernel signals, futex, etc., and found several culprits on why and how "spurious wake-ups" occur, but there is still something that I can't find the answer to...

Question: Will a spurious wake-up unblock all waiting/blocked threads, even the ones waiting for a completely unrelated notification? Or are there separate waiting queues for the blocked threads and therefore the threads waiting for another notification are protected?

Example: Let's say that we have 249 Spartans waiting to attack the Persians. They wait() for their leader, Leonidas (the 250th) to notify_all() when to attack. Now, on the other side of the camp there are 49 injured Spartans who are waiting for the doctor (the 50th) to notify_one() so that he could treat each one. Would a spurious wake-up unblock all waiting Spartans, including the injured ones, or would it only affect the ones waiting for battle? Are there two separate queues for the waiting threads, or just one for all?

Apologies if the example is misleading... I didn't know how else to explain it.

Constantinos Glynos
  • 2,952
  • 2
  • 14
  • 32
  • 3
    It is called "spurious" for a reason. You should not make bold assumptions about it. – user7860670 Feb 09 '20 at 11:28
  • @user7860670: No argument there.. But to which assumptions do you refer to? – Constantinos Glynos Feb 09 '20 at 11:32
  • 2
    Assumption that some threads will be (definitely) unblocked, assumption that some other threads won't be unblocked. – user7860670 Feb 09 '20 at 11:37
  • 1
    Always assume that *any* thread may be spuriously woken at *any* time, for *any* reason (including "no reason at all"). When a thread wakes it must *always* check if its preconditions for running are met and if they are not, go back to sleep. – Jesper Juhl Feb 09 '20 at 11:41
  • @user7860670: Ahh.. Nope, no assumption, just asking. If there are separate queues then the 2nd batch of waiting threads shouldn't be affected by the 1st. However, if there's only one queue and one listener for kernel notifications, then all are exposed? I'm just curious... – Constantinos Glynos Feb 09 '20 at 11:41
  • @JesperJuhl: I agree with that. My question is more about how it works under the hood. Are all threads exposed or just the ones in that particular waiting queue. – Constantinos Glynos Feb 09 '20 at 11:45
  • @ConstantinosGlynos That is implementation defined. There are no hard rules. – Jesper Juhl Feb 09 '20 at 11:46
  • @JesperJuhl: Yer... Your comment seems to agree with the answer below. Plan for the worst, hope for the less worst. – Constantinos Glynos Feb 09 '20 at 11:48

2 Answers2

2

Causes for spurious wakeups are specific to each operating system, and so are the properties of such wakeups. In Linux, for example, a wakeup happens when a signal is delivered to a blocked thread. After executing the signal handler the thread does not block again and instead receives a special error code (usually EINTR) from the system call that it was blocked on. Since signal handling does not involve other threads, they do not get woken up.

Note that spurious wakeup does not depend on the synchronization primitive you're blocking on or the number of threads blocked on that primitive. It may also happen with non-synchronization blocking system calls like read or write. In general, you have to assume that any blocking system call may return prematurely for whatever reason, unless it is guaranteed not to by a specification like POSIX (and even then, there may be bugs and OS specifics that deviate from the specification).

Some attribute superfluous notifications to spurious wakeups because dealing with both is usually the same. They are not the same, though. Unlike spurious wakeups, superfluous notifications are actually caused by another thread and are the result of a notify operation on the condition variable or futex. It's just the condition that you check upon the wakeup could turn to false before the unblocked thread manages to check it.

Andrey Semashev
  • 10,046
  • 1
  • 17
  • 27
  • Thank you for your answer! However, can you please verify the following: "Since signal handling does not involve other threads, they do not get woken up.". I have witnessed a case in linux, where all threads unblock from a spurious wake-up. – Constantinos Glynos Feb 09 '20 at 12:46
  • I'm not aware of a case where a single event causes a spurious wakeup of multiple threads. Maybe there is such a case, but I don't know one. – Andrey Semashev Feb 09 '20 at 12:51
  • Fair enough! I still think your answer is correct but I'm currently reading [this](https://akkadia.org/drepper/futex.pdf) paper and it seems that the well goes waayy too deep! :-P – Constantinos Glynos Feb 09 '20 at 12:57
  • From the paper in the link above, I **think** that this is what happens: Each futex has it's own waiting queue in the kernel. However, when the kernel sends out a notification, the futex (or futexes) do not know what this notification is, so instead of letting it go, they both catch/read/whatever it. As a result, all/some of the threads on that queue are woken up from a notification that may not have come from the condition_variable. This means that a spurious wake-up affects all threads because it is caught by all futexes in the kernel. Again.... I **think**. Any ideas? – Constantinos Glynos Feb 09 '20 at 14:32
  • A notification (meaning a `FUTEX_WAKE` or a similar operation) is not a spurious wakeup. A notification will wake as many threads as is specified in the operation. Which of the blocked threads are woken is generally unknown, but the number of the woken threads is. OTOH, a spurious wakeup is a wake up not caused by a notification or a wait timeout. – Andrey Semashev Feb 09 '20 at 14:37
  • It's becoming slightly clearer now. On your comment, "... but the number of the woken threads is.", even if the number of woken threads is known, it doesn't mean that only these threads will wake up. I have used `notify_one()` only to witness all the threads waking up. I blamed it on spurious wake-ups. On your other comment, "... is a wake up not caused by a notification or wait timeout.", surely something needs to trigger that futex to wake up the thread(s). Since the futex's waiting queue is in the kernel, then only a signal or a notification can trigger it? Or are there other things? – Constantinos Glynos Feb 10 '20 at 09:21
  • If `notify_one` unblocks multiple threads (reliably) then that is a buggy standard library implementation. http://eel.is/c++draft/thread.condition.condvar#5 says only one thread must be notified. If it unblocks multiple threads *only sometimes*, rarely, you could write it off to spurious wakeups. – Andrey Semashev Feb 10 '20 at 10:22
  • 1
    > surely something needs to trigger that futex to wake up the thread(s) -- It's not the futex who triggers a spurious wakeup. As the signal is delivered to a thread, the thread is removed from the waiting queue to execute the handler. After executing it, it does not return to the waiting queue but simply switches back to the previous execution context, which is in the syscall, and resumes execution from there (resulting in the syscall returning `EINTR`). I don't know what other events besides signals can cause spurious wakeups. – Andrey Semashev Feb 10 '20 at 10:29
  • Lovely explanation, thanks! I'm still new to this and it does my head in.. I thought the kernel communicates with the futex and then, the futex wakes up the threads. Now it makes sense and explains why a spurious wake up is unpredictable. You can never now what signal/notification may wake up the waiting threads abruptly. I think I got it now. – Constantinos Glynos Feb 10 '20 at 10:36
  • Some attribute superfluous notifications to spurious wakeups because dealing with both is usually the same. They are not the same, though. Unlike spurious wakeups, superfluous notifications are actually caused by another thread and are the result of operation on the condition variable or futex. It's just the condition that you check upon the wakeup could turn to false before the unblocked thread manages to check it. – Andrey Semashev Feb 10 '20 at 10:36
2

A spurious wakeup, in the context of a condition variable, is only from the waiters perspective. It means that the wait exited, but the condition is not true; thus the idiomatic use is:

Thing.lock()
 while Thing.state != Play {
     Thing.wait()
 }
 ....
 Thing.unlock()

Each iteration of this loop but one, would be considered spurious. Why they occur:

  1. Many conditions are being multiplexed onto a single condition variable; sometimes this is appropriate, sometimes it is just lazy
  2. A waiting thread beat your thread to the condition, and has changed its state before you get a chance to own it.
  3. Unrelated events, such as kill(2) handling do this to ensure consistency after asynchronous handlers have run.

The most important thing is to verify that the desired condition has been met, and retry or abandon if not. It is a common error to not recheck the condition which can be very difficult to diagnose.

As a more serious example should illustrate:

int q_next(Q *q, int idx) {
/* return the q index succeeding this, with wrap */
   if (idx + 1 == q->len) {
       return 0
   } else {
       return idx + 1
   }
}
void q_get(Q *q, Item *p) {
    Lock(q)
    while (q->head == q->tail) {
         Wait(q)
    }
    *p = q->data[q->tail]

    if (q_next(q, q->head) == q->tail) {
        /* q was full, now has space */
        Broadcast(q)
    }
    q->tail = q_next(q, q->tail)
    Unlock(q)
}
void q_put(Q *q, Item *p) {
    Lock(q)
    while (q_next(q, q->head) == q->tail) {
         Wait(q)
    }
    q->data[q->head] = *p
    if (q->head == q->tail) {
        /* q was empty, data available */
        Broadcast(q)
    }
    q->head = q_next(q, q->head)
    Unlock(q)
}

This is a multi-reader, multi-writer queue. Writers wait until there is space in the queue, put the item in, and if the queue was previously empty, broadcast to indicate there is now data. Readers wait until there is something in the queue, take the item from the queue, and if the queue was previously full, broadcast to indicate there is now space.

Note the condition variable is being used for two conditions {not full, not empty}. These are edge-triggered conditions: only the transition from full and from empty are signaled.

Q_get and q_put protect themselves from spurious wakeups caused by both [1] and [2], and you can readily instrument the code to show how often this happens.

mevets
  • 10,070
  • 1
  • 21
  • 33
  • 1
    Might be worth mentioning that, whereas truly spurious wakeups are rare, case (2) in your answer above is a common occurrence in a "multi-consumer" architecture. – Solomon Slow Feb 09 '20 at 18:53
  • Thank you very much for your answer. I am having some trouble visualising your 2nd point, but I got the idea. – Constantinos Glynos Feb 10 '20 at 09:26