Why does Python threading.Condition() notify() require a lock?

Question

My question refers specifically to why it was designed that way, due to the unnecessary performance implication.

When thread T1 has this code:

cv.acquire()
cv.wait()
cv.release()

and thread T2 has this code:

cv.acquire()
cv.notify()  # requires that lock be held
cv.release()

what happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1. Now, there is a race-condition between T2's release and T1's reacquiring after returning from wait(). If T1 tries to reacquire first, it will be unnecessarily resuspended until T2's release() is completed.

Note: I'm intentionally not using the with statement, to better illustrate the race with explicit calls.

This seems like a design flaw. Is there any rationale known for this, or am I missing something?

@o11c Indeed, thanks, and the question stands: `The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits` — Yam Marcovic, Sep 06 '17 at 15:35
then python is just being overrestrictive again. Admittedly, you *usually* want to hold the lock even in C. — o11c, Sep 06 '17 at 16:01
From a quick check of other languages' synchronization primitives, this requirement also exists in Java and C#, but it does not exist in pthreads, Windows condition variables, or c++11 condition variables. You'll usually want to hold the lock for whatever you're `notify`ing the other threads about, but the requirement to hold the lock for the `notify` itself doesn't seem necessary. It might be copied from a historical design where the lock was needed to protect the condition variable itself, or it might be deemed to promote safer lock usage. — user2357112, Sep 06 '17 at 16:19
@o11c Unfortunately I can't say whether it's usually the case or not. When I've used CVs in C++, I have mostly preferred to notify after releasing, and I think I can justify it, but not justify the other way around. — Yam Marcovic, Sep 06 '17 at 17:03
@user2357112 My gut feeling is that this will come down to either simply easier design, which is arguably inexcusable if other languages/libraries can do it well, or it's something which goes well with the "Pythonic way." In other words, just what you mentioned. But the question is, has this ever been discussed and explained, or simply implemented and forgotten about? By the way, what is the C# equivalent of a condition variable? I couldn't find one. AutoResetEvent is close, but doesn't require any locks. EDIT: NVM it's Monitor. :) — Yam Marcovic, Sep 06 '17 at 17:53
It's definitely easier to reason about correctness if you don't have to think about other threads interceding between whatever you're notifying about and the actual notify call. Especially for a language like Python, I'd support this design decision for that reason alone. I suspect the actual historical reason is that they copied what Java did, though; the documentation does mention that the `threading` module is based on Java's design. — user2357112, Sep 07 '17 at 05:54
@user2357112 I'm talking about cases where you're releasing the lock where you should, and the notify comes straight after that. In other words, "All done; now notify and wake up." The current design forces me, in such cases, to add another redundant suspension in the process. In what way does it improve the reasoning process? I'm guaranteed that after the release everything is visible on the other thread, so why the double wakeup? The fact it's based on Java is an interesting observation. Maybe the answer will come from there? — Yam Marcovic, Sep 07 '17 at 06:21
Another lead from Java's Condtion object doc: `An implementation may (and typically does) require that the current thread hold the lock associated with this Condition when this method is called.` So it's not strictly enforced, but if you want to write cross-platform code, you'd better hold the lock. Given the connection you pointed out with Java's model, I'll take an answer explaining why that "typically does" is there in the note. Thanks for the help so far, too! — Yam Marcovic, Sep 07 '17 at 06:26
Another quote from [Wiki](https://en.wikipedia.org/wiki/Monitor_(synchronization)#Blocking_condition_variables): `It is usually considered a best practice to perform the "signal"/"notify" operation before releasing mutex m that is associated with c, but as long as the code is properly designed for concurrency and depending on the threading implementation, it is often also acceptable to release the lock before signalling.` Not sure what "as long as the code is properly designed for concurrency" means. :) How would it work if it weren't, anyway? Note on threading implementation might be a lead. — Yam Marcovic, Sep 07 '17 at 06:33
@user2357112 One implication of notifying under lock is that you're guaranteed to only notify waiters that have already been waiting. If you notify after release, you're allowing a new one to come in. I could imagine situations where that would make a difference. But still can't see why that would be enforced in all cases by the library. — Yam Marcovic, Sep 07 '17 at 06:38

score 11 · Accepted Answer · answered Sep 13 '17 at 09:40

This is not a definitive answer, but it's supposed to cover the relevant details I've managed to gather about this problem.

First, Python's threading implementation is based on Java's. Java's Condition.signal() documentation reads:

An implementation may (and typically does) require that the current thread hold the lock associated with this Condition when this method is called.

Now, the question was why enforce this behavior in Python in particular. But first I want to cover the pros and cons of each approach.

As to why some think it's often a better idea to hold the lock, I found two main arguments:

From the minute a waiter acquire()s the lock—that is, before releasing it on wait()—it is guaranteed to be notified of signals. If the corresponding release() happened prior to signalling, this would allow the sequence(where P=Producer and C=Consumer) P: release(); C: acquire(); P: notify(); C: wait() in which case the wait() corresponding to the acquire() of the same flow would miss the signal. There are cases where this doesn't matter (and could even be considered to be more accurate), but there are cases where that's undesirable. This is one argument.
When you notify() outside a lock, this may cause a scheduling priority inversion; that is, a low-priority thread might end up taking priority over a high-priority thread. Consider a work queue with one producer and two consumers (LC=Low-priority consumer and HC=High-priority consumer), where LC is currently executing a work item and HC is blocked in wait().

The following sequence may occur:

P                    LC                    HC
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     execute(item)                   (in wait())
lock()                                  
wq.push(item)
release()
                     acquire()
                     item = wq.pop()
                     release();
notify()
                                                     (wake-up)
                                                     while (wq.empty())
                                                       wait();

Whereas if the notify() happened before release(), LC wouldn't have been able to acquire() before HC had been woken-up. This is where the priority inversion occurred. This is the second argument.

The argument in favor of notifying outside of the lock is for high-performance threading, where a thread need not go back to sleep just to wake-up again the very next time-slice it gets—which was already explained how it might happen in my question.

Python's `threading` Module

In Python, as I said, you must hold the lock while notifying. The irony is that the internal implementation does not allow the underlying OS to avoid priority inversion, because it enforces a FIFO order on the waiters. Of course, the fact that the order of waiters is deterministic could come in handy, but the question remains why enforce such a thing when it could be argued that it would be more precise to differentiate between the lock and the condition variable, for that in some flows that require optimized concurrency and minimal blocking, acquire() should not by itself register a preceding waiting state, but only the wait() call itself.

Arguably, Python programmers would not care about performance to this extent anyway—although that still doesn't answer the question of why, when implementing a standard library, one should not allow several standard behaviors to be possible.

One thing which remains to be said is that the developers of the threading module might have specifically wanted a FIFO order for some reason, and found that this was somehow the best way of achieving it, and wanted to establish that as a Condition at the expense of the other (probably more prevalent) approaches. For this, they deserve the benefit of the doubt until they might account for it themselves.

Davis Herring · Answer 2 · 2019-04-10T00:15:07.080

There are several reasons which are compelling (when taken together).

1. The notifier needs to take a lock

Pretend that Condition.notifyUnlocked() exists.

The standard producer/consumer arrangement requires taking locks on both sides:

def unlocked(qu,cv):  # qu is a thread-safe queue
  qu.push(make_stuff())
  cv.notifyUnlocked()
def consume(qu,cv):
  with cv:
    while True:       # vs. other consumers or spurious wakeups
      if qu: break
      cv.wait()
    x=qu.pop()
  use_stuff(x)

This fails because both the push() and the notifyUnlocked() can intervene between the if qu: and the wait().

Writing either of

def lockedNotify(qu,cv):
  qu.push(make_stuff())
  with cv: cv.notify()
def lockedPush(qu,cv):
  x=make_stuff()      # don't hold the lock here
  with cv: qu.push(x)
  cv.notifyUnlocked()

works (which is an interesting exercise to demonstrate). The second form has the advantage of removing the requirement that qu be thread-safe, but it costs no more locks to take it around the call to notify() as well.

It remains to explain the preference for doing so, especially given that (as you observed) CPython does wake up the notified thread to have it switch to waiting on the mutex (rather than simply moving it to that wait queue).

2. The condition variable itself needs a lock

The Condition has internal data that must be protected in case of concurrent waits/notifications. (Glancing at the CPython implementation, I see the possibility that two unsynchronized notify()s could erroneously target the same waiting thread, which could cause reduced throughput or even deadlock.) It could protect that data with a dedicated lock, of course; since we need a user-visible lock already, using that one avoids additional synchronization costs.

3. Multiple wake conditions can need the lock

(Adapted from a comment on the blog post linked below.)

def setSignal(box,cv):
  signal=False
  with cv:
    if not box.val:
      box.val=True
      signal=True
  if signal: cv.notifyUnlocked()
def waitFor(box,v,cv):
  v=bool(v)   # to use ==
  while True:
    with cv:
      if box.val==v: break
      cv.wait()

Suppose box.val is False and thread #1 is waiting in waitFor(box,True,cv). Thread #2 calls setSignal; when it releases cv, #1 is still blocked on the condition. Thread #3 then calls waitFor(box,False,cv), finds that box.val is True, and waits. Then #2 calls notify(), waking #3, which is still unsatisfied and blocks again. Now #1 and #3 are both waiting, despite the fact that one of them must have its condition satisfied.

def setTrue(box,cv):
  with cv:
    if not box.val:
      box.val=True
      cv.notify()

Now that situation cannot arise: either #3 arrives before the update and never waits, or it arrives during or after the update and has not yet waited, guaranteeing that the notification goes to #1, which returns from waitFor.

4. The hardware might need a lock

With wait morphing and no GIL (in some alternate or future implementation of Python), the memory ordering (cf. Java's rules) imposed by the lock-release after notify() and the lock-acquire on return from wait() might be the only guarantee of the notifying thread's updates being visible to the waiting thread.

5. Real-time systems might need it

Immediately after the POSIX text you quoted we find:

however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().

One blog post contains further discussion of the rationale and history of this recommendation (as well as of some of the other issues here).

Thank you for the detailed answer. I didn't understand the 2nd example in your 1st argument. Other than that, I think the first one is easily solved by pushing while locked. As to the second argument, it assumes Python's specific design decisions, and we know it could be implemented otherwise. As to the 3rd argument, sure, but it's an edge case and doesn't explain why that flow is enforced. As to the 4th argument, A) I'm talking about notifying after a release op; B) Could have inserted a barrier anyway. As to 5th, Python's implementation prevents RT considerations anyway (priority inversion). — Yam Marcovic, Sep 16 '17 at 19:18
#1: `lockedPush` is exactly your suggestion. #2 and #5: the requirement allows room for other implementations. #3: I did say you had to consider all these together. #4: A) the CV's state must be visible; B) that would be more synchronization. — Davis Herring, Sep 18 '17 at 00:34

newtover · Answer 3 · 2017-09-09T19:19:55.867

A couple of months ago exactly the same question occurred to me. But since I had ipython opened, looking at threading.Condition.wait?? result (the source for the method) didn't take long to answer it myself.

In short, the wait method creates another lock called waiter, acquires it, appends it to a list and then, surprise, releases the lock on itself. After that it acquires the waiter once again, that is it starts to wait until someone releases the waiter. Then it acquires the lock on itself again and returns.

The notify method pops a waiter from the waiter list (waiter is a lock, as we remember) and releases it allowing the corresponding wait method to continue.

That is the trick is that the wait method is not holding the lock on the condition itself while waiting for the notify method to release the waiter.

UPD1: I seem to have misunderstood the question. Is it correct that you are bothered that T1 might try to reacquire the lock on itself before the T2 release it?

But is it possible in the context of python's GIL? Or you think that one can insert an IO call before releasing the condition, which would allow T1 to wake up and wait forever?

Yep you misunderstood the question. :) And your updated understanding is correct. And yes, I can't see why the GIL would disturb anything here since it's a scheduling problem and can occur on a single-core system just as well. — Yam Marcovic, Sep 11 '17 at 19:32

score 1 · Answer 4 · answered Dec 12 '20 at 07:57

It's explained in Python 3 documentation: https://docs.python.org/3/library/threading.html#condition-objects.

Note: the notify() and notify_all() methods don’t release the lock; this means that the thread or threads awakened will not return from their wait() call immediately, but only when the thread that called notify() or notify_all() finally relinquishes ownership of the lock.

score 0 · Answer 5 · answered Sep 06 '17 at 15:58

0

What happens is that T1 waits and releases the lock, then T2 acquires it, notifies cv which wakes up T1.

Not quite. The cv.notify() call does not wake the T1 thread: It only moves it to a different queue. Before the notify(), T1 was waiting for the condition to be true. After the notify(), T1 is waiting to acquire the lock. T2 does not release the lock, and T1 does not "wake up" until T2 explicitly calls cv.release().

answered Sep 06 '17 at 15:58

Solomon Slow

25,130
5
37
57

This is incorrect. Firstly, my statement is accurate and I have just verified it by tinkering with the threading module and testing the timings and internal wakeups. Secondly, I think you are mistaken about your notion of how it works. The way it works is each waiter creates its own mutex and enqueues it into the condition's waiter queue, and each notify() dequeues one and releases its mutex, effectively waking up the wait() on the other thread. But then there's the reacquiring of the lock of the condition itself. And again, that's where the problem is. – Yam Marcovic Sep 06 '17 at 16:55
By the way, my guess is that they need the condition's lock to be acquired when notifying simply because it was convenient for them to design it that way so that the shared queue would be accessed safely by both notifiers and waiters. But again, that convenience comes at a price. This is the rationale I'm trying to explore. – Yam Marcovic Sep 06 '17 at 16:59
@YamMarcovic, I'm sorry, I did not read your question carefully enough. I thought you were asking about the programmer's model (i.e., how to understand and _use_ condition variables.) I did not realize that you were asking about the _implementation_. – Solomon Slow Sep 07 '17 at 15:48

Dustin Spicuzza · Answer 6 · 2017-09-06T13:52:53.350

There is no race condition, this is how condition variables work.

When wait() is called, then the underlying lock is released until a notification occurs. It is guaranteed that the caller of wait will reacquire the lock before the function returns (eg, after the wait completes).

You're right that there could be some inefficiency if T1 was directly woken up when notify() is called. However, condition variables are typically implemented via OS primitives, and the OS will often be smart enough to realize that T2 still has the lock, so it won't immediately wake up T1 but instead queue it to be woken.

Additionally, in python, this doesn't really matter anyways, as there's only a single thread due to the GIL, so the threads wouldn't be able to run concurrently anyways.

Additionally, it's preferred to use the following forms instead of calling acquire/release directly:

with cv:
    cv.wait()

And:

with cv:
    cv.notify()

This ensures that the underlying lock is released even if an exception occurs.

I don't understand: I demonstrated the race condition I was talking about. This isn't how condition variables work, it's a Python enforcement. And, in Python there is not one single thread, there is a single interpreted thread. And the race applies even there, because it's a scheduling race. And I can't see how the OS would have any knowledge of this, because it's userspace behavior that can work either way. — Yam Marcovic, Sep 06 '17 at 14:41

Why does Python threading.Condition() notify() require a lock?

6 Answers6

Python's `threading` Module

1. The notifier needs to take a lock

2. The condition variable itself needs a lock

3. Multiple wake conditions can need the lock

4. The hardware might need a lock

5. Real-time systems might need it

Linked

Related

Why does Python threading.Condition() notify() require a lock?

6 Answers6

Python's threading Module

1. The notifier needs to take a lock

2. The condition variable itself needs a lock

3. Multiple wake conditions can need the lock

4. The hardware might need a lock

5. Real-time systems might need it

Linked

Related

Python's `threading` Module