3

I've been reading this post of Jeff Preshing about The Synchronizes-With Relation, and also the "Release-Acquire Ordering" section in the std::memory_order page from cpp reference, and I don't really understand:

It seems that there is some kind of promise by the standard that I don't understand why it's necessary. Let's take the example from the CPP reference:

#include <thread>
#include <atomic>
#include <cassert>
#include <string>
 
std::atomic<std::string*> ptr;
int data;
 
void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}
 
void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}
 
int main()
{
    std::thread t1(producer);
    std::thread t2(consumer);
    t1.join(); t2.join();
}

The reference explains that:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory. This promise only holds if B actually returns the value that A stored, or a value from later in the release sequence.

as far as I understand, when we

ptr.store(p, std::memory_order_release)

What we're actually doing is telling both the compiler and the CPU that when running, make it so there will be no way that data and the memory pointed to by std::string* p will be visible AFTER the new value of ptr will be visible to thread t2.

And same, when we

ptr.load(std::memory_order_acquire)

We are telling the compiler and CPU: make it so the loading of ptr will be no later than then loading of *p2 and data.

So I don't understand what further promise we have here?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
YoavKlein
  • 2,005
  • 9
  • 38
  • 1
    You've described an approximate understanding of release/acquire, but I have no idea what other "some kind of promise" or "further promise" you're alluding to. The article you linked just says that _Synchronizes-With_ is an ordering relationship that can be accomplished with release/acquire, or with a mutex, or with thread creation. It doesn't say it's a stronger guarantee, just a more general concept. – Useless Mar 29 '22 at 14:40
  • Yes, I don't understand the question either. But in case it helps - do you understand why, if you changed the loads and stores to `memory_order_relaxed`, the program would then have a data race and undefined behavior? – Nate Eldredge Mar 29 '22 at 14:50
  • Yes I understand this, it just seems from the referenced sources that there is some special promise. Take a look at the Preshing article, don't you get this impression? – YoavKlein Mar 29 '22 at 14:52
  • @YoavKlein: No, I don't know what you mean. There is the promise in the Standard that, if the load observes the new value, then the store *happened before* the load in the formal happens-before partial order. The only thing "special" about this promise is that it doesn't apply with relaxed ordering or non-atomic variables. I don't see anything that suggests a "special promise" other than that. – Nate Eldredge Mar 29 '22 at 14:54
  • Part of the issue might be that you are thinking of the behavior in terms of "after" and "later", as if every operation occurs at a definite instant in time. That implicitly assumes that "time" gives a total ordering on all operations, even though not necessarily consistent with program order (i.e. "reordering" is possible). But the C++ memory model deliberately avoids assuming even that much; loads and stores do not have to be totally orderable at all, and only weaker partial orderings like "happens before" are relevant to defining a program's observable behavior. – Nate Eldredge Mar 29 '22 at 17:41
  • So if you are thinking that the memory model description seems overcomplicated compared with your understanding, that may be part of the reason. – Nate Eldredge Mar 29 '22 at 17:42

1 Answers1

5

This ptr.store(p, std::memory_order_release) (L1) guarantees that anything done prior to this line in this particular thread (T1) will be visible to other threads as long as those other threads are reading ptr in a correct fashion (in this case, using std::memory_order_acquire). This guarantee works only with this pair, alone this line guarantees nothing.

Now you have ptr.load(std::memory_order_acquire) (L2) on the other thread (T2) which, working with its pair from another thread, guarantees that as long as it read the value written in T1 you can see other values written prior to that line (in your case it is data). So because L1 synchronizes with L2, data = 42; happens before assert(data == 42).

Also there is a guarantee that ptr is written and read atomically, because, well, it is atomic. No other guarantees or promises are in that code.

ixSci
  • 13,100
  • 5
  • 45
  • 79
  • so basically those sources really contribute nothing - we already know that acquire barrier (i.e. `LoadLoad+LoadStore`) guarantees that all Loads before the barrier will happen BEFORE all operations (Loads + Stores) after the barrier. And we already know that release barrier (`LoadStore+StoreStore`) guarantees that all stores after the barrier will happen AFTER any operation before the barrier. – YoavKlein Mar 29 '22 at 15:08
  • And so, if you place a release fence in one thread after which an atomic guard variable is written to, and an acquire fence in another thread before which that same variable is read, it will happen that all operations done by the writing thread will be visible to the reading thread.. is that it? – YoavKlein Mar 29 '22 at 15:10
  • @YoavKlein, sorry I don't understand what "sources" you are talking about? You also should not think about C++ code it terms of hardware barriers, it doesn't help. Standard has a pretty clear synchronization model which might be hard to grasp at first but once mastered it is really pretty easy to reason about. Mixing that model with how hardware works won't help until you have a good understanding of both. – ixSci Mar 29 '22 at 15:17
  • And to your second comment: Yes. But you store the flag first, and then put the release fence. And put the acquire fence after you read the flag. @YoavKlein – ixSci Mar 29 '22 at 15:21
  • *will be visible to other threads as long as* - you left out the part where this is only guaranteed if the reading thread actually does load the value stored by that execution of the release-store. So in this case, `mo_acquire` is essential as you say, but also the spin-loop to wait until we see a non-NULL pointer. And that only works because the produce only writes it once. If the producer was also looping, it might have started another write of `data` while the reader was loading `ptr`. It seems a subtle point here, but it's key to generalizing to non-trivial programs. – Peter Cordes Mar 29 '22 at 21:19
  • @YoavKlein: The code in your question doesn't have any fences (2-way barriers), only release and acquire *operations* which are only ordered in 1 direction. https://preshing.com/20120913/acquire-and-release-semantics/ vs. https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/. Of course, ISO C++ doesn't define its memory model in terms of allowed reorderings at all, only in terms of a load seeing a value from a store and creating a "happens before" relationship between threads. – Peter Cordes Mar 29 '22 at 21:25
  • A model involving reordering also involves a shared state that exists whether any threads are looking or not, and ISO C++ doesn't guarantee / require the existence of that. (But all real-world CPUs do in practice have that.) – Peter Cordes Mar 29 '22 at 21:25
  • @PeterCordes, what does it mean that "This promise only holds if B actually returns the value that A stored, or a value from later in the release sequence."? – YoavKlein Mar 30 '22 at 05:00
  • @YoavKlein: If your load is some earlier or later value that didn't come from that store by that thread, you don't synchonize-with that thread. (Unless it's part of a release-sequence, e.g. if a 3rd thread did an atomic RMW on it instead of just a pure store.) – Peter Cordes Mar 30 '22 at 05:04
  • @PeterCordes - you mean that if thread A did `ptr.store(p, std::memory_order_release);` then "This promise only holds if" thread B did `ptr.load(std::memory_order_acquire))` i.e. did the acquire load on the same variable? – YoavKlein Mar 30 '22 at 05:50
  • @YoavKlein: Yes, of course the same variable, but also that it actually loaded `p` from `ptr`, not the initial NULL, or some later value stored by some other thread. That's why the reader thread has to keep re-reading until it sees a non-NULL value. Just loading once might or might not have run after the store. (But if it did, then it would sync-with it.) – Peter Cordes Mar 30 '22 at 05:56
  • @PeterCordes _"you left out the part where this is only guaranteed <...>"_ no I didn't? I specifically mention the value written and given the code in the question has a loop I didn't feel a need to put more text there. The loop per se is not important, that's just a way to guarantee you won't go further if no proper value got loaded. If you can do it with some other means, that would be just fine. – ixSci Mar 30 '22 at 06:01
  • @PeterCordes, ahh, you mean that if Thread A stored `0xabc` then the promise of synchronize-with only holds when Thread B reads `0xabc`? So that if Thread B read `0xdef` written by some other thread, it won't hold...? – YoavKlein Mar 30 '22 at 06:01
  • @YoavKlein it will hold if `0xdef` was written later (in modification order) than `0xabc` and the original (non-flag) value wasn't changed anywhere else. – ixSci Mar 30 '22 at 06:03
  • @ixSci: Oh yes, you get to that in the 2nd paragraph. I think when I read it before, I was thinking that paragraph 1 was supposed to be a full definition, but clearly that wasn't the intent. On re-reading it today, your answer is fine, those paragraphs capture the essential parts, I think. (Omitting the idea of a "release sequence" is probably a *good* thing, since it's a bunch of extra complexity that's only understandable once you understand basic acq/rel sync in the first place.) – Peter Cordes Mar 30 '22 at 06:59
  • 2
    re: your last comment to @YoavKlein about some other thread having written `0xdef` later in the modification order of `ptr` than `0xabc` - actually no, in ISO C++ that *won't* sync with and create a happens-before relationship between the reader and thread A that wrote `0xabc`. There's no release-sequence connecting A and B anymore, unless the 3rd thread used an RMW like `ptr.exchange(0xdef)`, not `ptr.store(0xdef)`. – Peter Cordes Mar 30 '22 at 07:07
  • On most real ISAs with their HW memory models, it would I think be guaranteed to work anyway, but machines that can do [IRIW reordering (like POWER)](https://stackoverflow.com/a/50679223/224132) may also be able to break this with store-forwarding between SMT threads. Perhaps by having the third thread see 0xabc early so its store can be after it in the modification order, before the non-atomic payload and the `ptr` value have become globally visible by committing to cache? Hmm, maybe not, maybe it's not plausible on a real machine with coherent shared cache, not point-to-point data movement. – Peter Cordes Mar 30 '22 at 07:11
  • @PeterCordes you are right, I misremembered what constitutes the release sequence. – ixSci Mar 30 '22 at 07:53