This is what I think the authors of the standard intended for C_17 and C++20. My version of the abstract machine (see below). The proof of logical equivalence of the old and a new abstract machine is not included though, sorry.
Regarding your question please read rule 13 below. You will notice the (very simple) difference. It is obvious when rules are simplified like that. That difference comes from this idiomatic example, which shows that acq_rel combo-fence is not enough to avoid x == 0 && y == 0, so a seq_cst fence with special properties (namely a full memory barrier, forget about the "single order") is needed:
// Without reordering
// thread A
std::atomic<int> a {0}, x {0};
extern std::atomic<int> b;
a.store(1, memory_order_relaxed);
std::atomic_thread_fence(memory_order_acq_rel);
if (b.load(memory_order_relaxed) == 1) {
x.store(1, memory_order_relaxed);
}
/* ... */
// thread B
std::atomic<int> b {0}, y {0};
extern std::atomic<int> a;
b.store(1, memory_order_relaxed);
std::atomic_thread_fence(memory_order_acq_rel);
if (a.load(memory_order_relaxed) == 1) {
y.store(1, memory_order_relaxed);
}
/* ... */
// With the reordering
// thread A
std::atomic<int> a {0}, x {0}, c;
extern std::atomic<int> b;
std::atomic_thread_fence(memory_order_acquire);
c = b.load(memory_order_relaxed);
a.store(1, memory_order_relaxed);
std::atomic_thread_fence(memory_order_release);
if (c == 1) {
x.store(1, memory_order_relaxed);
}
/* ... */
// thread B
std::atomic<int> b {0}, y {0}, d;
extern std::atomic<int> a;
std::atomic_thread_fence(memory_order_acquire);
d = a.load(memory_order_relaxed);
b.store(1, memory_order_relaxed);
std::atomic_thread_fence(memory_order_release);
if (d == 1) {
y.store(1, memory_order_relaxed);
}
/* ... */
Mathematicians are welcomed. Jokers too. Due respect to Jeff Preshing and his series of the articles about lock-free programming!
Multi-threaded executions and data races (new version to n2346 C_17)
Under a hosted implementation, a program can have more than one thread of execution (or thread) running concurrently. The execution of each thread proceeds as defined by the remainder of this document. The execution of the entire program consists of an execution of all of its threads. Under a freestanding implementation, it is implementation-defined whether a program can have more than one thread of execution.
If there are two accesses to the same memory location, one of which is a writing and both are not ordered by the "happens before" relation defined below, the behavior is undefined.
Note: an access is an action that happens in the execution environment, see rules n2346 (3.1, 5.1).
For any possible execution of the entire program, for any atomic object, any pair A, B of accesses to it, "A happens before B" xor "B happens before A". If there is also an access C to the atomic object, such that "A happens before C" and "C happens before B", then "A happens before B".
Note: the proof of the rule consists of: a) proving the absence of any consequences to the introduction of the converse rules of the cache coherence rules of the old abstract machine to the new one, thus extending the meaning (definition) of "happens before" relation; b) consolidating those converse rules into a single rule.
X, T, U are some accesses in the following rules. In the following rules all accesses may happen on different memory locations and may be evaluated in different threads, unless otherwise specified.
A simply inter-thread happens before B, if an access A to an atomic object happens before an access B to the same object, but which is evaluated in another thread.
A subexpression A carries a dependency to a subexpression B if:
the value of A is used as an operand of B, unless: • B is an invocation of the kill_dependency macro, or • A is the left operand of a && or || operator, or • A is the left operand of a ?: operator, or • A is the left operand of a , operator; or
A writes to a memory location, B reads from it, and A is sequenced before B.
Note: the "unless"-exceptions, which breaks "carries a dependency to" chain through operands of subexpressions, are chosen by the authors of the standard for the convenience of a programmer, no hardware limitations were implied here. Looks like.
A is consume-ordered before B if:
a release writing A simply inter-thread happens before a consume reading B; or
an access A is sequenced before X and X is consume-ordered before B; or
A is consume-ordered before X and X carries a dependency to an access B.
Note: an access X carries dependency to B, iff the subexpressions, which X, B are evaluated because of, do the same.
A simply synchronizes with B, if a writing A simply inter-thread happens before a reading B.
A relacq-synchronizes with B if:
a release writing A simply synchronizes with an acquire reading B; or
a release fence A is sequenced before T and T simply synchronizes with an acquire reading B; or
a release writing A simply synchronizes with U and U is sequenced before an acquire fence B; or
a release fence A is sequenced before T, T simply synchronizes with U and U is sequenced before an acquire fence B; or
a seqcst fence A is sequenced before T and T simply inter-thread happens before an acquire reading B; or
a release writing A simply inter-thread happens before U and U is sequenced before a seqcst fence B; or
a seqcst fence A is sequenced before T, T simply inter-thread happens before U and U is sequenced before a seqcst fence B.
- A is sync-ordered before B if:
A relacq-synchronizes with B; or
an access A is sequenced before X and X relacq-synchronizes with B; or
A relacq-synchronizes with X and X is sequenced before an access B; or
a seqcst access A is sequenced before a seqcst access C and C variously inter-thread happens before a seqcst access B; or
a seqcst access A variously inter-thread happens before a seqcst access C and C is sequenced before a seqcst access B.
Note: rules n2346 (7.17.3\3, 9-11; 7.17.4\3, 4) of the old abstract machine do not prohibit reordering of a reading (writing) before (after) a "fence" with a seqcst writing (reading) in another thread, the new abstract machine proudly does that and even more as it may have been intended by the authors of the standard. I may be wrong of course, so let mathematicians correct me.
A variously inter-thread happens before B if:
A simply inter-thread happens before B; or
A is consume-ordered before B; or
A is sync-ordered before B; or
A variously inter-thread happens before X and X variously inter-thread happens before B.
A happens before B, if A is sequenced before B or A variously inter-thread happens before B.
A is a visible writing to a memory location M with respect to a reading B from M, iff:
B reads the value written by A, if A is a visible writing to a reading B.
A reading from an atomic object reads the value from only one writing to that object.
Note: that and also rule 4 implicate that writing to be a visible writing.
Note: the declaration of an object (even without initializer) specifies an initialization writing to that object.
The implementation should not introduce a writing, which is not specified by any statement in a program, to any object declared in the program.
Multi-threaded executions and data races (new version to n4860 C++20)
Rules 1-13 are the same as the new rules 1-13 for C_17 above. Rules 17-23 are the same as the new rules 17-23 for C_17 above.
- A is sync-ordered before B if:
an access A is sequenced before X and X relacq-synchronizes with B; or
A relacq-synchronizes with X and X is sequenced before an access B; or
a seqcst access A is sequenced before a seqcst access C and C strongly inter-thread happens before a seqcst access B; or
a seqcst access A strongly inter-thread happens before a seqcst access C and C is sequenced before a seqcst access B.
- A strongly inter-thread happens before B if:
A simply inter-thread happens before B; or
A is sync-ordered before B; or
A strongly inter-thread happens before X and X strongly inter-thread happens before B.
- A variously inter-thread happens before B if:
A strongly inter-thread happens before B; or
A relacq-synchronizes with B; or
A is consume-ordered before B; or
A variously inter-thread happens before X and X variously inter-thread happens before B.