4

Consider this basic multithreading program using pthreads. We have a main thread, creating another thread that does some work.

bool done = false;
mutex m; 
condition c;

void foo() {
    pthread_mutex_lock(&m);
    //while(!done) {
        pthread_cond_wait(&c, &m);
        // Spuriously wakeup while child is doing work.
        // child thread has NOT unlocked the mutex yet
        // Do I now own the mutex?
        // or am I waiting for child to unlock it?
    //}
    pthread_mutex_unlock(&m);
}

void * child(void *arg) {
    pthread_mutex_lock(&m);

    some_intense_work(); // this work is done while mutex is held!
    // the main thread spuriously wakes up
    // while this work is being done
    // (while this child thread is holding the mutex)

    done = true;
    pthread_cond_broadcast(&c);
    pthread_mutex_unlock(&m);
}

int main(int argc, char *argv[]) {
    pthread_t p;
    pthread_create(&p, NULL, child, NULL);
    foo();
}

Pretend that we implement a wait without a surrounding while-clause checking for the predicate, even though we are aware that nobody should ever do this.

Now, if, while the child thread is doing its work, a spurious wakeup occurs in the main thread, what will the status of the mutex m be? Will the main thread own it without the child unlocking it first, so that both own it?

Or does a spurious wakeup only skip the wait for the condition, but not the wait for the mutex to be freed?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
JMC
  • 1,723
  • 1
  • 11
  • 20
  • 2
    When[`pthread_cond_wait()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_wait.html) returns successfully, the current thread has the mutex locked, regardless of whether the wakeup is spurious or not. The onus is on the woken thread to double-check the condition, waiting again if it is not satisfied. – Jonathan Leffler Dec 07 '16 at 00:56
  • so we could technically consider both threads to be in possession of the mutex in that moment? – JMC Dec 07 '16 at 00:57
  • 1
    No; only one thread has the mutex locked — you can't have both threads with the mutex locked because it wouldn't be a mutex. The broadcast doesn't alter the mutual exclusion property of the mutex. – Jonathan Leffler Dec 07 '16 at 00:58
  • Does that mean pthread_cond_wait() cannot successfully return while the child thread is in possession of the mutex, and that the the mutex therefore effectively protects us from a spurious wakeup? Please note that the spuriously wakeup is thought to occur BEFORE the child has unlocked the mutex. – JMC Dec 07 '16 at 01:01
  • 1
    Yes, it does mean that `pthread_cond_wait()` won't return in other threads while any one thread has the mutex locked. What do you mean by 'a spurious wakeup occurs in the main thread'? If the main thread is behaving and uses the mutex, regardless of whether it also uses the condition variable, it will not be able to proceed until it has the mutex locked, either because it did `pthread_cond_wait()` — or `pthread_cond_timedwait()` and the timeout didn't expire — or because it directly locked the mutex. (I'm assuming you've simply omitted the error checking to keep the code simple.) – Jonathan Leffler Dec 07 '16 at 01:08
  • By 'a spurious wakeup occurs in the main thread' I mean that for whatever reason the often quoted "spurious wakeups" are said to occur, the main thread stop waiting prematurely, before the broadcast of the condition variable. If it cannot proceed before it has the mutex locked, how can the "spurious wakeups" ever occur? – JMC Dec 07 '16 at 01:12
  • You're now delving deeper than I've gone with pthreads — I've not encountered spurious wakeups of the sort you describe, but I've not done enough with pthreads for that to be a reliable indicator of anything. (The main threading system I worked with was unrelated to pthreads.) On the face of it, what you describe can't happen according to the spec. If it does, I'd have to suspect there's a bug in the pthread implementation you're working with. However, that's an abstract viewpoint, not one rooted in pragmatic experience — one of the reasons I've not (yet) created an answer. – Jonathan Leffler Dec 07 '16 at 01:19
  • Does your real code meticulously check the return values from all the pthread functions that return error indications, often (usually?) via the zero returned on success or a positive error number on failure protocol? If not, you may be missing important information. – Jonathan Leffler Dec 07 '16 at 01:23
  • 1
    I guess [Why does ``pthread_cond_wait()`` have spurious wakeups?](http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups) should be mandatory reading. Likewise, perhaps, [Do spurious wakeups actually happen?](http://stackoverflow.com/questions/1050592/do-spurious-wakeups-actually-happen) The second one is for Java, but the answers reference `pthread_cond_wait()`. – Jonathan Leffler Dec 07 '16 at 01:25
  • My real code doesn't have the problem because of the surrounding while loop, that is normally always used. I think with the links you've provided I have figured it out. In this case, I lock the mutex a "long" time before actually changing the variable "done" which it protects. Thus, I think in this example the mutex would protect from a spurious wakeup but only if that wakeup occurs after the lock in foo() has already been called, which we cannot assume. No matter how I look at it, the while loop is neccessary, making this hypothetical situation irrelevant. – JMC Dec 07 '16 at 02:11
  • 1
    You seem to get away with it, but nominally, you should initialize the mutex and the condition, either with `PTHREAD_MUTEX_INITIALIZER` and `PTHREAD_COND_INITIALIZER` respectively, or [`pthread_mutex_init()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_init.html) and [`pthread_cond_init()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_init.html) respectively. – Jonathan Leffler Dec 07 '16 at 05:02

1 Answers1

3

The pthread_cond_wait() call cannot 'spuriously' wake while some other thread is holding the associated mutex. When pthread_cond_wait() returns successfully, it will have claimed the mutex, so it cannot successfully return until the mutex is available.

In your example, a spurious wake could occur because foo() can call pthread_cond_wait() and have the spurious wake occur before child() ever gets a chance to call pthread_mutex_lock() in the first place.

Another problem in your example (with the commented code left disabled) has is that it's possible for the pthread_cond_wait() call to never wake. This scenario can happen if child() completes all of it's processing before foo() manages to acquire the mutex. in that scenario, child() will call pthread_cond_broadcast() before the main thread is waiting in pthread_cond_wait() so the main thread will miss the broadcast. Since foo() never checks done while holding the mutex, it won't notice that child() has finished its work.

That's why pthread_cond_wait() pretty much always has to be performed in a loop that checks the condition.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Thank you. Very shortly before you answered I had commented that I assumed this to be the case, and you've confirmed exactly that. The while loop remains neccessary. – JMC Dec 07 '16 at 02:14
  • In your second paragraph, you say a spurious wake could occur because `foo()` calls `pthread_cond_wait()` before `child()` can lock the mutex. How could that happen? What could signal the condition before the child locks the mutex, performs the calculation, broadcasts the condition and unlocks the mutex? AFAICS, `foo()` could certainly call `pthread_cond_wait()`, but it would be hung (and the mutex unlocked) until the child succeeded in its processing. – Jonathan Leffler Dec 07 '16 at 05:08
  • The 'never wake' scenario is a real problem; if the child runs first, the mutex lock in `foo()` would fail until the child unlocked the mutex after processing, and therefore the broadcast would indeed occur while `foo()` is not in `pthread_cond_wait()`. What is the approved/normalway of synchronizing things so that `foo()` gets to hang in `pthread_cond_wait()` before `child()` grabs the mutex? – Jonathan Leffler Dec 07 '16 at 05:09
  • 2
    @JonathanLeffler: regarding the spurious wake - my understanding is that a spurious wake is allowed to occur for whatever reason - it could occur without the `child()` thread ever calling a pthread function. So it's possible that after the `child()` thread is created but before it actually gets to run to the point where it acquires the mutex, all of the following could occur: the `main()` thread calls `foo()` which acquires the mutex, calls `pthread_cond_wait()` and gets a spurious wake. – Michael Burr Dec 08 '16 at 02:21