Synchronization problem with std::atomic<>

Question

I have basically two questions that are closely related and they are both based on this SO question: Thread synchronization problem with c++ std::atomic variables

As cppreference.com explains:

For memory_order_acquire:

A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread

For memory_order_release: A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable

Why people say that memory_order_seq_cst MUST be used in order for that example to work properly? What's the purpose of memory_order_acquire if it doesn't work as the official documentation says so? The documentation clearly says: All writes in other threads that release the same atomic variable are visible in the current thread.
Why that example from SO question should never print "bad\n"? It just doesn't make any sense to me.

I did my homework by reading all available documentation, SO queastions/anwers, googling, etc... But, I'm still not able to understand some things.

Is that Dekker's algorithm or a variation? seq_cst prevents StoreLoad reordering between a seq_cst store and a later seq_cst load in the same thread, so you can actually observe global state after your store, not just store-forwarding. Most algorithms don't need that, Dekker's does, since it uses multiple stores instead of an atomic RMW to decide a winner between multiple threads. — Peter Cordes, Nov 14 '22 at 09:40
@fyou It is not just about releasing/acquiring data. It is also about "happens before" and "happens after" things. It is about instructions ordering. Compiler (so CPU) is free to reorder instructions. "release" and "acquire" on atomics do not affect the ordering, while "seq_cst" does. Another way to order instructions is to use barriers + relaxed atomic read/writes — Dmytro Ovdiienko, Nov 14 '22 at 10:27
BTW, *All writes in other threads that release the same atomic variable are visible* isn't accurate. ISO C++ doesn't guarantee that, although it's true on most CPUs. You only actually create a happens-before relationship with the one release operation that wrote the value you loaded, and anything that was part of a *release sequence* of atomic RMWs leading to that. Other pure stores earlier in the modification order are *not* part of the same release sequence. See [What does "release sequence" mean?](https://stackoverflow.com/q/38565650) (including my answer) — Peter Cordes, Nov 14 '22 at 10:54
I suspect that CPUs where you don't actually sync with earlier stores in the modification order might also be ones that do IRIW reordering, or maybe just that if anything's going to allow surprising memory ordering it'll be PowerPC: [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](https://stackoverflow.com/q/27807118) (Fun fact: seq_cst also guarantees all threads agree on a store order, which isn't the case for any weaker ordering.) — Peter Cordes, Nov 14 '22 at 10:57

score 0 · Accepted Answer · answered Nov 14 '22 at 13:31

0

Your linked question has two atomic variables, your "cppreference" quote specifically mentions "same atomic variable". That's why the reference text doesn't cover the linked question.

Quoting further from cppreference: memory_order_seq_cst : "...a single total order exists in which all threads observe all modifications in the same order".

So that does cover modifications to two atomic variables.

Essentially, the design problem with memory_order_release is that it's a data equivalent of GOTO, which we know is a problem since Dijkstra. And memory_order_acquire is the equivalent of a COMEFROM, which is usually reserved for April Fools. I'm not yet convinced that they're good additions to C++.

answered Nov 14 '22 at 13:31

MSalters

173,980
10
155
350

I strongly disagree that there's anything wrong with `release`/`acquire`. In many cases, especially simpler ones, there's nothing at all weird about them, and even if you do use `seq_cst`, the synchronization most code depends on is the acq/rel part that creates the happens-before. The semantics are pretty straightforward: https://preshing.com/20120913/acquire-and-release-semantics/. In fact most programs are fine with only those, including for spinlocks and mutexes. Given that performance is the goal, it would be insane not to include these in C++; I can see the argument in Java, though. – Peter Cordes Nov 14 '22 at 16:22
The "tricky" one is `memory_order_consume`; that's the one that created a mess with `std::kill_dependency` / `[[carried_dependency]]`. And turned out to be too hard to correctly implement so compilers gave up and promoted it to acquire; ISO C++ has temporarily deprecated its usage. `memory_order_relaxed` is even weaker, but you don't always need any synchronization, just atomicity, so it doesn't even try to let you build anything with just-barely-strong-enough synchronization. – Peter Cordes Nov 14 '22 at 16:25
Obviously you have to understand what they do before using them, to know when your case is simple and acq/rel gives you everything you need. `seq_cst` is safer if you're not sure. But using `std::mutex` is even safer than trying to do lock-free programming at all without understanding it, if we're talking about production use rather than attempts to learn lock-free / memory ordering stuff. – Peter Cordes Nov 14 '22 at 16:27
@PeterCordes: The argument against `GOTO` is not an argument against the `JMP` assembly instruction; `GOTO` is just a poor fit for structured programming. `release/acquire` have essentially the same issue IMO - a primitive concept that makes sense at a lower level, but not a good match for C++. – MSalters Nov 14 '22 at 23:55
IDK what you think the point of C++ is, if not to expose as much of what modern CPUs can do so programmers don't need to write in assembly to get good performance. If acq/rel wasn't supported, that wouldn't be the case. There are higher level language that are memory-safe and free from undefined behaviour if you want to walk a path not surrounded with land-mines and pitfalls. The existence of acq/rel doesn't mean you have to use it, if you don't care about performance enough to reason about it. And the existence of GOTO in the language doesn't make it a good idea for most programs. – Peter Cordes Nov 15 '22 at 01:58
I don't have much of a problem with folks (like Herb Sutter) who argue that most people shouldn't just stick to `seq_cst`, but I don't see any justification for saying the language shouldn't even support `acq`/`rel`. I don't find acq/rel hard to understand for simple use-cases, where it's the right tool for the job. Certainly it can be harder to verify that it's sufficient for your needs when things get complicated, and using `seq_cst` is a reasonable choice then, especially if it performs well enough on the machines you care about. – Peter Cordes Nov 15 '22 at 02:02

Synchronization problem with std::atomic<>

1 Answers1