Effect/Fullfillment of std::memory_order_* on x86(-64)

Question

I have the following code:

#include <cstdint>
#include <atomic>

void myAtomicStore(std::atomic<int32_t>& i, const int32_t v) {
    i.store(v, std::memory_order_release);
}

int myAtomicLoad(std::atomic<int32_t>& i, const int32_t v) {
    return i.load(std::memory_order_acquire);
}

And (according to this) GCC 8.1 translated it (for x86_64) into:

myAtomicStore(std::atomic<int>&, int):
        mov     DWORD PTR [rdi], esi
        ret
myAtomicLoad(std::atomic<int>&, int):
        mov     eax, DWORD PTR [rdi]
        ret

I wonder how the mov's instructions can make all the writes to memory prior to myAtomicStore() become visible to another thread when it calls myAtomicLoad() on the same variable (or memory location) -- guaranteed by C++ standard.

I skimmed Intel's manual; and I don't see anything obvious.

Thanks!

You don't have *any other code*. Vacuously it is not running out-of-order — Caleth, Jul 09 '18 at 08:34
@Caleth: that's not the reason (except vs. compile-time reordering). release stores apply to stores in the calling function, too. The reason is that x86's strong memory model makes every store a release store, and every load an acquire load, so the compiler doesn't need any extra instructions. http://preshing.com/20120930/weak-vs-strong-memory-models/. See also (http://preshing.com/20120515/memory-reordering-caught-in-the-act/. Looking for a duplicate Q&A; IIRC some quote Intel's memory ordering rules from the manual. — Peter Cordes, Jul 09 '18 at 08:42
Possible duplicate of [Does it make any sense to use the LFENCE instruction on x86/x86\_64 processors?](https://stackoverflow.com/questions/20316124/does-it-make-any-sense-to-use-the-lfence-instruction-on-x86-x86-64-processors) — Peter Cordes, Jul 09 '18 at 09:33
The top answer [on that duplicate](https://stackoverflow.com/questions/20316124/) quotes Intel's SDM section 8.2.2, with the memory-ordering rules that disallow all reordering other than StoreLoad. See also [Atomic operations, std::atomic<> and ordering of writes](https://stackoverflow.com/a/32394427), and [Race condition on x86](https://stackoverflow.com/a/6623662), and [Does an x86 CPU reorder instructions?](https://stackoverflow.com/a/50310563), and the memory-ordering links in the x86 tag wiki: https://stackoverflow.com/tags/x86/info — Peter Cordes, Jul 09 '18 at 09:37
@PeterCordes Thanks for the links. I think I got it -- x86 doesn't allow CPU to reorder writes (with some exceptions mentioned in your last [link](https://stackoverflow.com/questions/20316124/does-it-make-any-sense-to-use-the-lfence-instruction-on-x86-x86-64-processors)); hence, a series of simple `mov` has already "fulfilled" the guarantee mentioned in C++ standard. Thanks! — HCSF, Jul 09 '18 at 09:39
Yes, normal loads/stores have `acquire` and `release` semantics, but not `seq_cst`. For that you need `mfence`, or preferably `xchg [mem], eax` to do a sequential-release store. (And of course RMW operations are always seq_cst on x86 at least vs. runtime reordering (compile time reordering is still possible for mo_relaxed). See [Can num++ be atomic for 'int num'?](//stackoverflow.com/q/39393850): you need `lock add` on a multicore system.) — Peter Cordes, Jul 09 '18 at 09:41
@PeterCordes While you've already answered this question, note that the top answer on that possible duplicate is mostly incorrect. — Hadi Brais, Jul 09 '18 at 10:07
@HadiBrais: hmm, yeah [Does x86-SSE-instructions have an automatic release-acquire order?](https://stackoverflow.com/a/27302931) which quotes the same stuff without making false claims about `lfence + sfence` might be a better dup target. — Peter Cordes, Jul 09 '18 at 10:12
@HadiBrais: I edited the accepted answer on the first question I linked, because it's maybe a better dup target for this question. — Peter Cordes, Jul 09 '18 at 10:24
@PeterCordes The link you added is pretty much the most useful thing in the answer (specifically your answer there). Regarding the other [one](https://stackoverflow.com/questions/19093137/does-x86-sse-instructions-have-an-automatic-release-acquire-order/27302931#27302931), only a little better, but still imprecise/incomplete because it does not talk about atomicity. Is there any answer on SO that clearly and precisely explains when and why x86 instructions have acquire and/or release semantics?... — Hadi Brais, Jul 09 '18 at 10:30
...Even Preshing's [article](http://preshing.com/20120913/acquire-and-release-semantics/) seems to only vaguely say `*usually*, every load on x86/64 already implies acquire semantics and every store implies release semantics. This is why x86/64 is *often* said to be strongly ordered`. (emphasis mine). — Hadi Brais, Jul 09 '18 at 10:30
@PeterCordes I have spent few hours on some of your links...still can't finish all. All I can say -- nice answers! Thanks for sharing. (Continue reading) — HCSF, Jul 09 '18 at 14:30

Effect/Fullfillment of std::memory_order_* on x86(-64)

0 Answers0