Does C++ provide a complete set of memory barriers?

Question

Acquire-release memory ordering provides a way to create a partial ordering of surrounding operations between two points in the program. This memory ordering inhibits reordering of instructions or memory operations (maybe partially) across this barrier. This essentially guarantees some synchronization across various marked points. But this is just a pairwise relationship and there is no total ordering guarantee. There are four values: relaxed, acquire, release, acq-rel.

Sequential-consistency property guarantees a total ordering of memory at various points in the program. Just plain sequential consistency property does not inhibit any kind of instruction reordering or memory reordering. It just ensures that, marked instructions or memory operations happens in a global order. But this does not give any guarantees about surrounding instructions(?). There are two values: not-cst, seq-cst.

I'm not sure if there are any other types of membars or other ways to categorize them. Wherever I see, I only see these barriers. So I intuitively listed them like above.

Now, by combining acquire-release ordering with sequential consistency property, we can create many kinds of barriers (at least on paper):

LOAD:          (C++ std::atomic_load)
relaxed                          (C++ std::memory_order_relaxed)
relaxed with seq-consist         (C++ ?)
acquire                          (C++ std::memory_order_acquire)
acquire with seq-consist         (C++ std::memory_order_seq_cst)

STORE:         (C++ std::atomic_store)
relaxed                          (C++ std::memory_order_relaxed)
relaxed with seq-consist         (C++ ?)
release                          (C++ std::memory_order_release)
release with seq-consist         (C++ std::memory_order_seq_cst)

EXCHANGE:      (C++ std::atomic_exchange)
relaxed                          (C++ std::memory_order_relaxed)
relaxed with seq-consist         (C++ ?)
acquire                          (C++ std::memory_order_acquire)
acquire with seq-consist         (C++ ?)
release                          (C++ std::memory_order_release)
release with seq-consist         (C++ ?)
acq-rel                          (C++ std::memory_order_acq_rel)
acq-rel with seq-consist         (C++ std::memory_order_seq_cst)

FENCE:         (C++ std::atomic_thread_fence)
acquire                          (C++ std::memory_order_acquire)
acquire with seq-consist         (C++ ?)
release                          (C++ std::memory_order_release)
release with seq-consist         (C++ ?)
acq-rel                          (C++ std::memory_order_acq_rel)
acq-rel with seq-consist         (C++ std::memory_order_seq_cst)

In the table, there are some missing values (relaxed_seq_cst load, stores; acquire_seq_cst and release_seq_cst fences; etc). Are they redundant? Or are they harder to use correctly / debug? Or it is left out just to reduce complexity in the library?

You're talking about *barriers*, but only your last block of example (`atomic_thread_fence`) is actually about barriers. The others seem to be about memory_order params for *operations* (like load, store, or RMW). They have a different meaning in that context, (partially) ordering just the operation wrt. to surrounding code. "Barrier" normally means "fence", a 2-way thing. https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect has details on the fact that `atomic_thread_fence(release)` is still a 2-way fence; some early-C++11 experts had misconceptions). — Peter Cordes, Jul 22 '22 at 21:21
A C++ program that uses `seq_cst` for all its atomics, and doesn't contain any data-race UB, will execute *as if* the entire thing ran on a sequentially-consistent parallel machine. This is called SC-DRF : Sequential Consistency for Data-Race-Free programs. i.e. the observable results of the program are all explainable by some interleaving of program order for *all* statements. Any reordering of non-atomic operations between (not across) atomics could only be observed with a data race, so C++ doesn't have to preserve that. — Peter Cordes, Jul 22 '22 at 21:37
(If your program includes *any* weaker atomic orders, reasoning about whether SC is recovered gets tricky. Often you don't need SC, though, so weaker orders are useful in practice.) — Peter Cordes, Jul 22 '22 at 21:38
*Just plain sequential consistency property does not inhibit any kind of instruction reordering or memory reordering.* - Incorrect. It does inhibit that. The global order of all SC operations has to be compatible with program order ("sequenced before"), so the only possible executions are ones that are some interleaving of source order. (Again for a data-race-free program). **seq_cst is stronger than acq_rel; it has acquire and/or release plus the additional total-order requirement.** I think this is the key misunderstanding behind your question? — Peter Cordes, Jul 23 '22 at 02:22
Related but not really a duplicate: [The strong-ness of x86 store instruction wrt. SC-DRF?](https://stackoverflow.com/q/70249647) . And I think [Memory Model in C++ : sequential consistency and atomicity](https://stackoverflow.com/q/38425920) is a potential duplicate; it somewhat explains what SC means. — Peter Cordes, Jul 23 '22 at 02:25
Sorry, I think I still don't have clear understanding on memory ordering correctly. I will edit the question once I dig into those questions. — Sourav Kannantha B, Jul 23 '22 at 11:28
@PeterCordes When I mentioned about _plain sequential consistency_, I was not talking about C++ `seq_cst` flag. I was talking about having a global order between marked operations (not surrounding operations). There is no equivalent to this in current C++ (maybe because it is not much useful). — Sourav Kannantha B, Jul 23 '22 at 11:35
C++ `seq_cst` *does* create/require a global total order that includes *all* `seq_cst` operations and fences, which is compatible with the program-order ("sequenced-before") order of those operations. Perhaps an example of what you're talking about would make it clearer what you mean when you use those words, like something you think C++ allows with `seq_cst` ops now, vs. what you want to allow. Terms like "relaxed with seq-consist" make zero sense to me; no idea what you might be expecting from that contradiction. I can only guess that you've misunderstood what C++ seq_cst actually does. — Peter Cordes, Jul 23 '22 at 14:45
@PeterCordes when we do a `seq_cst` load, it makes that operation acquire load, so operations below that line can go above. By "relaxed with seq-consist", I meant that, tagged operation by itself will have a global order with other `seq_cst` operations. But it will not inhibit reordering of instructions around it. I was just trying to imagine all the possibilities. I don't know this has any use cases though. — Sourav Kannantha B, Jul 23 '22 at 18:25
Oh, I see, so your opening 2 paragraphs are your own terminology, not talking about C++ `seq_cst`. So a bit like how `volatile` operations can't reorder at compile time with other `volatile` operations, but no other guarantees. Except you're picturing that they couldn't reorder at run-time either. — Peter Cordes, Jul 23 '22 at 18:31
I don't know if any hardware supports that, at least not on cache-coherent hardware. If coherence required explicit flushing, then you could *only* flush the memory locations of atomic objects, and not even try to make other stores visible, if the program didn't use any release or acquire that made other objects visible to other threads, maybe. — Peter Cordes, Jul 23 '22 at 18:31
@PeterCordes Ohh yes, C++ `volatile` behaves almost similar to what I was trying to explain by "relaxed seq-consist". But, does `volatile` provide global order in C++? — Sourav Kannantha B, Jul 23 '22 at 18:46
No, like I said, C++ `volatile` only guarantees *compile time* ordering, not run time. So even on x86, StoreLoad reordering would still be possible between volatile operations by one thread. — Peter Cordes, Jul 23 '22 at 18:47
In first two paragraphs, I was explaining what those two terminologies could mean in a general case, vaguely, not in relation to C++. Then, I tried to match those cases with what already exist in C++. — Sourav Kannantha B, Jul 23 '22 at 18:48
Yeah, I can see that now, but when a C++ question starts out by defining terms that C++ uses, readers naturally assume that they're trying to state their understanding of the C++ terms. Unless the question states otherwise *first*, which this doesn't — Peter Cordes, Jul 23 '22 at 18:49
Glad that I was able to clarify what I was trying to convey :). If may edit the question to make it more clearer?? — Sourav Kannantha B, Jul 23 '22 at 18:52

Does C++ provide a complete set of memory barriers?

0 Answers0