3

Take a seq_cst fence for example. The main explanations I've seen are:

1. It gives you the synchronizes-with relationships of acquire and release (if you include the appropriate loads and stores) and also all the fences happen in the same order for all threads (which doesn't seem very useful).

2. also this. It prevents all memory reads or writes on the current thread from being reordered with ones on the other side of the fence (which seems very useful).

3. Incomprehensible standardese, except for the line "in many cases, memory_order_seq_cst atomic operations are reorderable with respect to other atomic operations performed by the same thread" which seems to contradict number 2.

How do these definitions mean the same thing? I find synchronizes-with a useful way to think about acquire and release, is there a similarly elegant mental model for seq_cst?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Dan
  • 12,409
  • 3
  • 50
  • 87
  • Possibly the simplest model is the first half of the paragraph you cite in #3: if you use *only* seq_cst operations, then you can legitimately think of your program as if it was running on a single-core in-order machine with multitasking, which may switch between threads at any time. Each thread will see memory contents as if all other threads had stopped at some well-defined point in their execution, with all operations before that point visible, and none of the ones after. – Nate Eldredge Feb 08 '23 at 14:09
  • @NateEldredge It's true that that's the simplest model, but most programs don't use only seq_cst, and I don't see how to apply that intuition to understand the interaction of seq_cst and non-seq_cst operations. – Dan Feb 09 '23 at 08:09

1 Answers1

2

Your (1) is the easy to understand explanation.

I don't see how your links relate to your statement in (2). I don't see the statement you wrote there anywhere in the rust article you linked. You also linked an explanation of a #LoadStore fence, but it doesn't say anything about how this relates to sequentially consistent operations.

The c++11 standard seems to suggest your statement, though:

There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order.

In C++20, your definitely (2) holds. memory_order_seq_cst operations and fences can't be reordered in respect to each other in any way.

Regarding (3) I don't know if I can help you with comprehending the standardese. Reading more of Preshing's blog posts might help – for example The Synchronizes-With Relation. Regarding this statement:

in many cases, memory_order_seq_cst atomic operations are reorderable with respect to other atomic operations performed by the same thread

The "other" atomic operations are non-seq_cst operations. They can still be reordered in respect to the seq_cst operations. For example, these two statements are allowed to be reordered:

std::atomic<int> a, b;

b.load(std::memory_order_seq_cst)
a.store(std::memory_order_relaxed)
Chronial
  • 66,706
  • 14
  • 93
  • 99
  • An SC fence generally needs to include a StoreLoad fence, blocking earlier stores from reordering with later loads. That's the hard / expensive on that acq_rel fences *don't* need to block; CPUs naturally want to load early and store late, so draining the store buffer is slow. ([Can a speculatively executed CPU branch contain opcodes that access RAM?](https://stackoverflow.com/q/64141366) / [How does memory reordering help processors and compilers?](https://stackoverflow.com/q/37725497)). – Peter Cordes Feb 07 '23 at 22:18
  • Perhaps they linked Preshing's article on acq/rel fences because it talks about what it *doesn't* need to do? Except it doesn't actually talk about that. Maybe they got the wrong article. – Peter Cordes Feb 07 '23 at 22:18
  • 1
    *in many cases, memory_order_seq_cst atomic operations are reorderable with respect to other atomic operations performed by the same thread* - That doesn't contradict (2), (2) was about seq_cst **fences**, not **operations**. An atomic operation on an object can reorder locally with other atomic operations if they're not both seq_cst. But barriers can create stronger ordering between relaxed operations. I think (2) is true *for fences* which the OP initially said they were talking about, like `std::atomic_thread_fence(std::memory_order_seq_cst)`. – Peter Cordes Feb 07 '23 at 22:25
  • I had closer look and this seems to depend on the C++ version. In C++11, that is still just an acquire+release fence, which does not prevent writes from moving "downwards" through the fence or reads from moving "upwards" through the fence. But in C++20, sequentially consistent operations actually can't be reordered at all. I will edit my answer accordingly. – Chronial Feb 07 '23 at 23:23
  • Even before C++20, `atomic_thread_fence(seq_cst)` did need to be stronger than `atomic_thread_fence(acq_rel)`. For example on x86, only the SC fence has to compile to `mfence` or equivalent, but an `acq_rel` fence can compile to zero asm instructions, just compile-time ordering. https://godbolt.org/z/edYdGPozq shows current and really old GCC for x86 with `-std=gnu++11`. You can also look at other ISAs, but x86 is one where acq_rel fences don't need any asm. I haven't looked at the formal standard wording recently, so I'm not sure what actual difference there is. – Peter Cordes Feb 07 '23 at 23:48
  • 1
    I found https://stackoverflow.com/q/69773579 also helpful in reconciling the definitions: If one thread does load y, fence A, load x, and another thread does store x, fence B, store y (where load/store are relaxed), and load x doesn't see the result of store x, then load y can't see the result of store y, since the first implies A before B and the second implies B before A. – Dan Feb 08 '23 at 13:31