C++20: How is the returning from atomic::wait() guaranteed by the standard?

Question

This is a language-lawyer question.

First of all, does the a.wait() in the following code always get to return?

std::atomic_int a{ 0 };

void f()
{
    a.store(1, std::memory_order_relaxed);
    a.notify_one();
}
int main()
{
    std::thread thread(f);

    a.wait(0, std::memory_order_relaxed);//always return?

    thread.join();
}

I believe the standard's intention is that a.wait() always get to return. (Otherwise atomic::wait/notify would be useless, isn't it?) But I think the current standard text cannot guarantee this.

The relevant part of the standard is in §31.6 [atomics.wait] paragraph 4:

A call to an atomic waiting operation on an atomic object M is eligible to be unblocked by a call to an atomic notifying operation on M if there exist side effects X and Y on M such that:

(4.1) — the atomic waiting operation has blocked after observing the result of X,

(4.2) — X precedes Y in the modification order of M, and

(4.3) — Y happens before the call to the atomic notifying operation.

and §31.8.2 [atomics.types.operations] paragraph 29~33:

void wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;

void wait(T old, memory_order order = memory_order::seq_cst) const noexcept;

Effects: Repeatedly performs the following steps, in order:

(30.1) — Evaluates load(order) and compares its value representation for equality against that of old.

(30.2) — If they compare unequal, returns.

(30.3) — Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.

void notify_one() volatile noexcept;

void notify_one() noexcept;

Effects: Unblocks the execution of at least one atomic waiting operation that is eligible to be unblocked (31.6) by this call, if any such atomic waiting operations exist.

With the above wording, I see two problems:

If the wait() thread saw the value in step (30.1), compared it equal to old in step (30.2), and got scheduled out; then in another thread notify_one() stepped in and saw no blocking thread, doing nothing; the subsequent blocking in step (30.3) would never be unblocked. Here isn't it necessary for the standard to say "wait() function atomically performs the evaluation-compare-block operation", similar to what is said about condition_variable::wait()?
There's no synchronization between notify_*() and unblocking of wait(). If in step (30.3), the thread was unblocked by an atomic notifying operation, it would repeat step (30.1) to evaluate load(order). Here there is nothing preventing it from getting the old value. (Or is there?) Then it would block again. Now no one would wake it.

Is the above concern just nit-picking, or defect of the standard?

@NateEldredge, I think the `store(1)` in `f` corresponds to the `Y` in the standard. It is sequenced before the `notify`, hence happens before the `notify`. — zwhconst, Dec 04 '21 at 19:15
So then I don't see a problem. The `wait` is an atomic wait operation, and if it loads 0, then it is eligible to be unblocked by the notify. So the notify must in fact unblock it. It's the entire call to `wait` that's an atomic wait operation, not merely step 3 of it. — Nate Eldredge, Dec 04 '21 at 19:31
Unlike Nicol Bolas below, I see no way to prove that the store happens-before the second load. So I agree that the second load could still see the old value. Now, it seems to still be true that the wait is eligible to be unblocked by that same notify that already happened, but I don't think that's meant to force it to unblock itself immediately and keep loading `a` until it finally sees the new value. AFAICT you're supposed to be able to notify a thread even if the variable hasn't changed, and it should go back to sleep until you call `notify()` a second time. — Nate Eldredge, Dec 04 '21 at 20:31
I'm not even sure that making everything `seq_cst` would fix it. What if both the loads in the wait precede the store in the total order S? That would be extremely counterintuitive, but I don't see how to disprove it. I am beginning to wonder if #2 is indeed a defect, and that they meant to have the notify synchronize with the resulting unblock (though not with any spurious unblock). — Nate Eldredge, Dec 04 '21 at 20:41
I think your #1 is very similar to https://stackoverflow.com/questions/65556595/c20-thread-possibly-waiting-on-stdatomic-forever?rq=1 — Nate Eldredge, Dec 04 '21 at 20:46
@NateEldredge _I'm not even sure that making everything `seq_cst` would fix it. What if both the loads in the wait precede the store in the total order S?_ Till now, I was thinking `seq_cst` reads are like RMW operations — see the last value from the modification order of M, or at least something like the last value stored by a `seq_cst` write to M. Which is not really the case… — Language Lawyer, Dec 04 '21 at 21:02
Well, I think that much is true, at least if every operation in sight is `seq_cst`. But which one is the "last value" is kind of tautological. If your load happens-before a store, then you see the old value; but even with `seq_cst` the converse is not true. If the coherence ordering on `a` were "initialize to 0, load 0 first time, load 0 second time, store 1" then everything is fine; the "store 1" doesn't synchronize with anything, since nobody loads the value 1. To contradict it, I think we have to show that the "store 1" happens-before the second load, which I see no way to prove. — Nate Eldredge, Dec 04 '21 at 21:15
@NateEldredge _But which one is the "last value" is kind of tautological_ Ehm, why? Modification order of M is specified and the write of 1 is the last operation there when `wait` is unblocked. The thing is, even `seq_cst` doesn't require reading the value from the last operation in modification order of M. — Language Lawyer, Dec 04 '21 at 21:20
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/239829/discussion-between-nate-eldredge-and-language-lawyer). — Nate Eldredge, Dec 04 '21 at 21:37

score 5 · Answer 1 · answered Dec 04 '21 at 23:23

#1 is pretty much addressed by C++20 thread possibly waiting on std::atomic forever. The wait() operation is clearly eligible to be unblocked by the notify(), and is the only such operation, so the notify() must unblock it. The eligible "wait operation" is the entire call, not only step 30.3.

If an implementation performs steps 30.1-3 in a non-atomic fashion, such that the notify can happen "between" steps 1 and 3, then it has to somehow ensure that step 3 unblocks anyway.

#2 is stickier. At this point I think you are right: the standard doesn't guarantee that the second load gets the value 1; and if it doesn't, then it will presumably block again and never be woken up.

The use of the relaxed memory ordering makes it pretty clear in this example. If we wanted to prove that the second load must see 1, the only way I can see is to invoke write-read coherence (intro.races p18) which requires that we prove the store happens before the load, in the sense of intro.races p10. This in turn requires that somewhere along the way, we have some operation in one thread that synchronizes with some operation in the other (you can't get inter-thread happens before without a synchronizes with, unless there are consume operations which is not the case here). Usually you get synchronizes with from a pairing of an acquire load with a release store (atomics.order p2), and here we have no such thing; nor, as far as I can tell, anything else that would synchronize. So we don't have a proof.

In fact, I think the problem persists even if we upgrade to seq_cst operations. We could then have both loads coherence-ordered before the store, and the total order S of atomics.order p4 would go "first load, second load, store". I don't see that contradicting anything. We would still have to show a synchronizes with to rule this out, and again we can't. There might appear to be a better chance than in the relaxed case, since seq_cst loads and stores are acquire and release respectively. But the only way to use this would be if one of the loads were to take its value from the store, i.e. if one of the loads were to return 1, and we are assuming that is not the case. So again this undesired behavior seems consistent with all the rules.

It does make you wonder if the Standard authors meant to require the notification to synchronize with the unblocking. That would fix the problem, and I would guess real-life implementations already include the necessary barriers.

But indeed, I am not seeing this specified anywhere.

The only possible way out that I can see is that "eligible to be unblocked" applies to the entire wait operation, not just to a single iteration of it. But it seems clear that the intent was that if you are unblocked by a notify and the value has not changed, then you block again until a second notify occurs (or spurious wakeup).

It's starting to look to me like a defect.

Because the thread is "eligible to be unblocked" and because notify_one() [atomics.types.operations 32] "Effects: Unblocks the execution of at least one atomic waiting operation on *ptr that is eligible to be unblocked (31.6) by this call, if any such atomic waiting operations exist," I think it has to be unblocked regardless of the memory_order used. — P. Mattione, May 30 '23 at 04:19
@P.Mattione: Certainly it does get unblocked, at least once. The concern is that when it does, it may not see the updated value in the atomic variable. In which case, it would block a second time, and, while it is again eligible to be unblocked, there is no second notify operation to unblock it. — Nate Eldredge, May 30 '23 at 06:58
Hmm, it seems that there must be something that forces the thread-unblock-call in notify_one() to happen-before the unblock in wake(), but yeah it would be nice if the standard said this explicitly. — P. Mattione, May 31 '23 at 02:30

Broothy · Answer 2 · 2022-07-19T05:57:59.453

0

1. wait atomicity: As "M is eligible to be unblocked by a call to an atomic notifying operation" in the situation you described (after the 30.2 logical step but before 30.3), the implementation has to comply.

If there is "a waiting operation that is eligible to be unblocked" then notify_one has to unblock at least one wait call - not it's internal block - , regardless of whether it is before step (30.3) or not.

So the implementation must ensure that the notification is being delivered and the (30.1)...(30.3) steps are repeated in the described case.

2. store/notify order: First clarify the terms the standard uses:

"the value of an atomic object M" is used to refer the underlying T object.
"the value pointed to by this" is used to refer the underlying T object too. It is misleading I think, as the internal representation of the std::atomic<T> object is not required to be identical with a T object, so it refers something like - just an example -: this->m_underlying_data (because this can have more members, as sizeof(T) != sizeof(std::atomic<T>) can be true, cannot refer simply *this)
"an atomic object M" term is used to refer the whole std::atomic<T> typed object, the real *this. As explained in this thread.

It is guaranteed by the standard that even with the relaxed order the sequence of the modifications on the same atomic object has to be consistent among different threads:

C++20 standard 6.9.2.1 (19):

[Note: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. — end note]

As the standard doesn't say how atomic should be implemented, notify could modify the atomic object. This is why it is not a const function and I think the following applies:

C++20 standard 6.9.2.1 (15):

If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M.

These are why this statement is wrong:

Here there is nothing preventing it from getting the old value.

As the store and notify_one operate on the same atomic object (and modify it), their order has to be preserved. So it is granted that there will be a notify_one after the value is 1.

edited Jul 19 '22 at 05:57

answered Jul 15 '22 at 08:57

Broothy

659
5
20

So your interpretation is that `notify_one` is actually a modification of the atomic, and therefore included in the modification order? That seems like a stretch to me, given that `notify_one` doesn't modify the atomic's value in the usual sense. And a typical implementation of `notify_one` would not modify the atomic itself, but perhaps some other atomic somewhere else. – Nate Eldredge Jul 15 '22 at 14:37
§31.6.1 "Atomic waiting operations and atomic notifying operations provide a mechanism to wait for the value of an atomic object to change more efficiently than can be achieved with polling." In my opinion: As wait and notify are referred as atomic operations the following applies: "The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. " – Broothy Jul 15 '22 at 15:40
The "four preceding coherence requirements" [quote](https://eel.is/c++draft/intro.races#19) is a *note* (thus not normative, I think), a summary of write-write, write-read coherence, and so on. The language predates the introduction of C++20 wait and notify. Reads are called "value computations", and I don't think `notify` could be called that. The terminology for writes is "side effect" of an operation, or "modification". I could see `notify` being a side-effect, but as @NateEldredge says, hard to call it a modification. – Peter Cordes Jul 15 '22 at 17:50
1

Interesting idea to try to apply that rule, though. In practice, wait/notify probably do synchronize with each other if the thread actually went to sleep, probably acq/rel sync via the OS communicating across cores via some other memory location. (With acq/rel avoiding this very problem.) – Peter Cordes Jul 15 '22 at 17:52
In my opinion as the standard doesn't say how atomic should be implemented, notify could modify the atomic object. This is why it is not a const function. As it modifies the atomic object 6.9.2.2 (15) applies as well: "If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M." This is why I think the notify_one must be ordered after store for sure. – Broothy Jul 15 '22 at 18:10
1

Oh interesting, yeah you could make a distinction between modifying the `atomic` object (which notify can do) vs. modifying the value of the underlying `T` object, which we know notify must not do. Unless it's a no-op modification like `fetch_add(0)` – Peter Cordes Jul 16 '22 at 00:07
I think the standard distinguishes clearly and uses the term "The value of an atomic object M" or "the value pointed to by this" whenever referring to the underlying T object. ("the value pointed to by this" is misleading I think, as the internal representation of the atomic object [`std::atomic` object] is not required to be identical with a T object, so it refers something like `this->m_underlying_data` [as this can have more members, as `sizeof(T) != sizeof(std::atomic)` can be true]). The "atomic object M" term is used to refer the whole `std::atomic` typed object, the real *this. – Broothy Jul 16 '22 at 07:18
Yeah, the standard's wording doesn't rule out an `atomic` with extra members. And yeah, the wording they use about side-effects on objects is compatible with this line of reasoning, I think. BTW, if you're replying, make sure to notify with @username so people see your replies. I just happened to look at this again. – Peter Cordes Jul 16 '22 at 21:44
Ah great, thank you. @NateEldredge what do you think about my explanation? – Broothy Jul 17 '22 at 06:46
Well, I think it would work, but if that was really the line of logic that the standard authors intended, I'd hope they'd have given some hints in the notes. I still think it's more likely that they meant to say "synchronizes-with" and simply forgot. – Nate Eldredge Jul 18 '22 at 06:21
1

_"an atomic object M" term is used to refer the whole `std::atomic` typed object_ Much doubt. [Comparing with `atomic_ref`](https://timsong-cpp.github.io/cppwp/n4861/atomics.ref.generic#1.sentence-1) and given that [only scalar objects can be modifier](https://timsong-cpp.github.io/cppwp/n4861/defns.access) (while `std::atomic` is of class type), "atomic object" is an object managed/referred-to by `std::atomic`. – Language Lawyer Jul 19 '22 at 00:28
1

If `notify_one` is modification and the wake up "reads" its "value", it can not "see" the assignment of `1` anymore, since there is no "going back" in the modification order of any atomic object M. – Language Lawyer Jul 19 '22 at 02:30
@LanguageLawyer For your first comment: https://stackoverflow.com/questions/54885590/definition-of-atomic-object - the atomic object is indeed the `std::atomic` typed object and not the underlying T object. How could an `int` be an atomic object? For the second comment: `notify_one` could modify the atomic object (`std::atomic` typed object), not the atomic value. As both `store` and `notify_one` are "operations that modifies an atomic object M" (as they are not const functions), their execution order is consistent for all the threads. So `wait` is guaranteed to see `1` at the end. – Broothy Jul 19 '22 at 05:28
_How could an `int` be an atomic object?_ You completely ignore the links I'm posting, don't you? – Language Lawyer Jul 19 '22 at 05:31
No, I didn't, but the section you sent applies to `std::atomic_ref`- and the wording is not the best IMO. If you follow the link I pasted I kind of agree with the following explanation: "The C++ standard imposes a set of rules on operations and effects of operations on atomic objects. If all operations on an object satisfy those rules, then that object is atomic.". In `std::atomic_ref` the `*ptr` is an atomic object just because `std::atomic_ref` hides all of its operations and makes them available through the `std::atomic_ref` interface. An int* itself is not atomic I hope you agree. – Broothy Jul 19 '22 at 05:47

Andrey Semashev · Answer 3 · 2022-07-27T21:27:17.910

This question was asked on std-discussions, so I'll post here my answer in that discussion.

If the wait() thread saw the value in step (30.1), compared it equal to old in step (30.2), and got scheduled out; then in another thread notify_one() stepped in and saw no blocking thread, doing nothing; the subsequent blocking in step (30.3) would never be unblocked. Here isn't it necessary for the standard to say "wait() function atomically performs the evaluation-compare-block operation", similar to what is said about condition_variable::wait()?

The operation is called an "atomic waiting operation". I agree, this can be read as a "waiting operation on an atomic", but practically speaking, it is clear that the steps described in the operation description need to be performed atomically (or have the effect of being atomic). You could argue that the standard wording could be better, but I don't see that as a logical error in the standard.

There's no synchronization between notify_*() and unblocking of wait(). If in step (30.3), the thread was unblocked by an atomic notifying operation, it would repeat step (30.1) to evaluate load(order). Here there is nothing preventing it from getting the old value. (Or is there?) Then it would block again. Now no one would wake it.

There is no synchronization between notify_* and wait because such synchronization would be redundant.

The synchronizes-with relation is used to determine the order of events happening in different threads, including events on different objects. It is used in definition of inter-thread happens before and, by induction, of happens before. In short, "a release operation on an atomic in one thread synchronizes with an acquire operation on the atomic in another thread" means that effects on objects (including other than the atomic) that were made prior to the release operation will be observable by operations that are performed following the acquire operation.

This release-acquire memory ordering semantics is redundant and irrelevant for the notify_* and wait operations because you already can achieve the synchronizes-with relation by performing a store(release) in the notifying thread and wait(acquire) in the waiting thread. There is no reasonable use case for notify_* without a prior store or read-modify-write operation. In fact, as you quoted yourself, an atomic waiting operation is eligible to be unblocked by an atomic notifying operation only if there was a side effect on the atomic that was not first observed by the waiting operation.

What gives the guarantee that the load in the wait observes the effect of the store in the notifying thread is sequential execution guarantee in the notifying thread. The store call is sequenced before the notify_one call because those are two full expressions. Consequently, in the waiting thread, store happens before notify_one (following from the definitions of happens before and inter-thread happens before). This means that the waiting thread will observe the effects of store and notify_one in that order, never in reverse.

So, to recap, there are two scenarios possible:

The notifying thread stores true to the atomic first, and the waiting thread observes that store on entry into wait. The waiting thread does not block, notify_one call is ignored.
The waiting thread does not observe the store and blocks. The notifying thread issues store and notify_one, in that order. At some point, the waiting thread observes the notify_one (due to the eligible to be unblocked relation between notify_one and wait). At this point, it must also observe the effect of store (due to the happens before relation between store and notify_one) and return.

If the waiting thread observed notify_one but not store, it would violate the happens before relation between those operations, and therefore it is not possible.

Update 2022-07-28:

As was pointed out in the comments by Broothy, the happens before relation between store and notify_one may be guaranteed only in the notifying thread but not in the waiting thread. Indeed, the standard is unclear in this regard, as it required the happens before relation, but it doesn't specify in which thread this relation has to be observed. My answer above assumed that the relation has to be maintained in every thread, but this may not be the correct interpretation of the standard. So, in the end I agree the standard needs to be clarified in this regard. Specifically, it needs to require that store inter-thread happens before notify_one.

In the example code both store and load operations use relaxed memory ordering. 6.9.2.1 (15) - "If an operation A that modifies an atomic object M happens before an operation B that modifies M, then A shall be earlier than B in the modification order of M." if this rule cannot be applied to the store/notify pair basically nothing prevents the compiler to move `notify` before `store`. I still believe that `notify` and `store` are operations on the same atomic object, so their order should be kept all amongst the threads. — Broothy, Jul 19 '22 at 13:39
@Broothy `notify_*` is not a modification operation on the atomic. — Andrey Semashev, Jul 19 '22 at 14:01
Then it should be a `const` function, right? If both `store` and `load` are called with relaxed memory order nothing provides the synchronizes-with relation, nothing sequences `notify` after the `store`. — Broothy, Jul 19 '22 at 14:03
I still stand with what I wrote in my answer, the guarantee follows from the *happens before* relation. — Andrey Semashev, Jul 19 '22 at 14:28
Could you elaborate please what ensures the happens before relation between the `store` and `notify`? If we try to apply "the inter-thread happens before" criteria to the `store` - `notify` pair: A synchronizes with B - no; A is dependency-ordered before B [A performs a release operation on an atomic object M , and, in another thread, B performs a consume operation on M - no; for some evaluation X, A is dependency-ordered before X and X carries a dependency to B - no] - no; A synchronizes with X and X is sequenced before B - no; A is sequenced before X and X inter-thread happens before B - no. — Broothy, Jul 19 '22 at 15:19
@Broothy It's in the definition of *happens before*, "A is sequenced before B", where A and B are `store` and `notify_*`. — Andrey Semashev, Jul 19 '22 at 16:36
Yes, but we need inter-thread happens before here, not just happens before. Happens before guarantees that `store` happens before `notify` on the notifying thread. But the waiting thread is a different one, and as a different observer it is completely possible, that it will see `notify` first, then the `store`. If `store` would inter-thread happen before `notify`, we could state that `wait` will always see `store` first. — Broothy, Jul 19 '22 at 17:08
"because you already can achieve the synchronizes-with relation by performing a store(release) in the notifying thread and wait(acquire) in the waiting thread." I disagree, as I explained in my answer. You only get a synchronizes-with if the acquire load takes its value from the release store. But we haven't yet proved that this actually happens. So this logic cannot disprove the possibility of the post-unblock load returning 0, because in such a case there is no synchronization and nothing to contradict. In essence, this argument affirms the consequent. — Nate Eldredge, Jul 25 '22 at 01:33
@NateEldredge > You only get a synchronizes-with if the acquire load takes its value from the release store. -- You don't get unblocked from `wait` if it doesn't observe the `store`. If it does observe it then it synchronizes-with it. This is not the issue raised by the question. The issue is whether the `notify` can be observed before `store` in the waiting thread, and that is completely unrelated to the synchronizes-with relation. — Andrey Semashev, Jul 25 '22 at 17:34
@AndreySemashev "The issue is whether the `notify` can be observed before `store` in the waiting thread" - Yes, exactly. Even if `store` and `load` were in "synchronized with" relationship: `store` would be a release operation, `load` an acquire. `notify` is after `store`, and a release-acquire pair guarantees that everything "happens before" `store` must be visible by the acquiring thread after `load` (says nothing about things after release). So as `notify` happens after the `store`, nothing prevents `notify` to be observed before `store`, even if `store` and `load` were synchronized with. — Broothy, Jul 26 '22 at 05:57
@Broothy Yes, you may be right. I was basing my answer on the premise that the *happens-before* relation between `store` and `notify` is observed in the waiting thread, but this may not be the correct interpretation of the standard. The wording is confusing as it doesn't say *in what thread* each of the relations is observed. That is, the standard says it *happens before*, but it doesn't say *in what thread*. I assumed *in every thread*, but that may not be the correct assumption. — Andrey Semashev, Jul 26 '22 at 16:29

C++20: How is the returning from atomic::wait() guaranteed by the standard?

3 Answers3

Linked