Does the semantics of `std::memory_order_acquire` requires processor instructions on x86/x86_64?

Question

It is known that on x86 for the operations load() and store() memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel does not require a processor instructions for the cache and pipeline, and assembler's code always corresponds to std::memory_order_relaxed, and these restrictions are necessary only for the optimization of the compiler: http://www.stdthread.co.uk/forum/index.php?topic=72.0

And this code Disassembly code confirms this for store() (MSVS2012 x86_64):

std::atomic<int> a;
    a.store(0, std::memory_order_relaxed);
000000013F931A0D  mov         dword ptr [a],0  
    a.store(1, std::memory_order_release);
000000013F931A15  mov         dword ptr [a],1

But this code doesn't comfirm this for load() (MSVS2012 x86_64), using lock cmpxchg:

    int val = a.load(std::memory_order_acquire);
000000013F931A1D  prefetchw   [a]  
000000013F931A22  mov         eax,dword ptr [a]  
000000013F931A26  mov         edx,eax  
000000013F931A28  lock cmpxchg dword ptr [a],edx  
000000013F931A2E  jne         main+36h (013F931A26h)  

    std::cout << val << "\n";

But Anthony Williams said:

some_atomic.load(std::memory_order_acquire) does just drop through to a simple load instruction, and some_atomic.store(std::memory_order_release) drops through to a simple store instruction.

Where am I wrong, and does the semantics of std::memory_order_acquire requires processor instructions on x86/x86_64 lock cmpxchg or only a simple load instruction mov as said Anthony Williams?

ANSWER: It is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885

I'm not sure looking at what the compiler generates is necessarily a good way to determine the requirements of a particular functionality - it's not unheard of that compilers does "more than they need to". — Mats Petersson, Sep 02 '13 at 16:04
@Mats Petersson Yes, but there is nothing easier than to do nothing. And that was required from the compiler, nothing but `mov`. Really the developers of Microsoft have failed with this the simplest task: "do nothing"? :) — Alex, Sep 02 '13 at 16:14
I know MS VC (at least SOME versions) will generate extra "locking" on variables declared as `volatile` - not because the C++ standard requires it, but because some bits of code that USED to work on single core processors suddenly work poorly if you use SMP systems. This looks similar to one of those situations. — Mats Petersson, Sep 02 '13 at 16:16
@Mats Petersson All right. But the volatile appeared a long time ago, when there was nothing known about the `std::memory_order`. And to avoid unnecessary calls to the WinAPI or assembler code, they decided to use the barriers(`lock`) for volatile - these three solutions are equally not beautiful. But now with the new standard C++11 all are clearly defined and there is one elegant solution - `mov`. Maybe for older x86 processors require to lock for `load()`? — Alex, Sep 02 '13 at 16:26
My point is that the compiler doesn't HAVE to generate the most efficient code for any particular construct - if that was a strict requirement, -O3 wouldn't generate better code than -O0 to put it very simply. And of course, it's entirely possible that this is an artifact/bug from `volatile` being used inside `std::atomic` (I believe `volatile` is required by the standard). — Mats Petersson, Sep 02 '13 at 16:29
Is it the same as this bug report? http://connect.microsoft.com/VisualStudio/feedback/details/770885 — jcoder, Sep 02 '13 at 16:42
@Mats Petersson But `std::atomic` and `volatile` are very different things following the standard, its must uses in different cases, and `std::atomic` must not use `volatile` in its implementation. http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=1 — Alex, Sep 02 '13 at 16:44
I think jcoder is onto something. In which case it's a compiler bug... — Mats Petersson, Sep 02 '13 at 16:53
@jcoder Yes. Thanks for this! You can write it as answer and I submit it. Microsoft-saboteurs does not want to solve this problem: "it's been resolved as **"Deferred" because we may not have time to fix it in VC12**" :) — Alex, Sep 02 '13 at 16:54
@jcoder And if you know, may be this is a MSVS2012 bug too? http://stackoverflow.com/questions/18577584/are-on-x86-64-and-arm-platforms-for-any-atomic-cas-operations-always-using-the-o — Alex, Sep 02 '13 at 16:57

score 8 · Accepted Answer · edited Jan 06 '22 at 15:18

No. The semantics of std::memory_order_acquire doesn't requires processor instructions on x86/x86_64.

Any load()/store() operations on x86_64 doesn't require processor instructions (lock/fence) except atomic.store(val, std::memory_order_seq_cst); which requires (LOCK) XCHG or alternative: MOV (into memory),MFENCE.

Processor memory-barriers-instructions for x86(except CAS), and also ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Disassembler GCC 4.8.1 x86_64 - GDB - load():

    20      temp = a.load(std::memory_order_relaxed);
    21      temp = a.load(std::memory_order_acquire);
    22      temp = a.load(std::memory_order_seq_cst);
0x46140b  <+0x007b>         mov    0x38(%rsp),%ebx
0x46140f  <+0x007f>         mov    0x34(%rsp),%esi
0x461413  <+0x0083>         mov    0x30(%rsp),%edx

Disassembler GCC 4.8.1 x86_64 - GDB - store():

a.store(temp, std::memory_order_relaxed);
a.store(temp, std::memory_order_release);
a.store(temp, std::memory_order_seq_cst);
0x4613dc  <+0x004c>         mov    %eax,0x20(%rsp)
0x4613e0  <+0x0050>         mov    0x38(%rsp),%eax
0x4613e4  <+0x0054>         mov    %eax,0x20(%rsp)
0x4613e8  <+0x0058>         mov    0x38(%rsp),%eax
0x4613ec  <+0x005c>         mov    %eax,0x20(%rsp)
0x4613f0  <+0x0060>         mfence
0x4613f3  <+0x0063>         mov    %ebx,0x20(%rsp)

Disassembler MSVS 2012 x86_64 - load() - it is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885:

    temp = a.load(std::memory_order_relaxed);
000000013FE51A1F  prefetchw   [a]  
000000013FE51A24  mov         eax,dword ptr [a]  
000000013FE51A28  nop         dword ptr [rax+rax]  
000000013FE51A30  mov         ecx,eax  
000000013FE51A32  lock cmpxchg dword ptr [a],ecx  
000000013FE51A38  jne         main+40h (013FE51A30h)  
000000013FE51A3A  mov         dword ptr [temp],eax  
    temp = a.load(std::memory_order_acquire);
000000013FE51A3E  prefetchw   [a]  
000000013FE51A43  mov         eax,dword ptr [a]  
000000013FE51A47  nop         word ptr [rax+rax]  
000000013FE51A50  mov         ecx,eax  
000000013FE51A52  lock cmpxchg dword ptr [a],ecx  
000000013FE51A58  jne         main+60h (013FE51A50h)  
000000013FE51A5A  mov         dword ptr [temp],eax  
    temp = a.load(std::memory_order_seq_cst);
000000013FE51A5E  prefetchw   [a]  
    temp = a.load(std::memory_order_seq_cst);
000000013FE51A63  mov         eax,dword ptr [a]  
000000013FE51A67  nop         word ptr [rax+rax]  
000000013FE51A70  mov         ecx,eax  
000000013FE51A72  lock cmpxchg dword ptr [a],ecx  
000000013FE51A78  jne         main+80h (013FE51A70h)  
000000013FE51A7A  mov         dword ptr [temp],eax

Disassembler MSVS 2012 x86_64 - store():

    a.store(temp, std::memory_order_relaxed);
000000013F8C1A58  mov         eax,dword ptr [temp]  
000000013F8C1A5C  mov         dword ptr [a],eax  

    a.store(temp, std::memory_order_release);
000000013F8C1A60  mov         eax,dword ptr [temp]  
000000013F8C1A64  mov         dword ptr [a],eax  

    a.store(temp, std::memory_order_seq_cst);
000000013F8C1A68  mov         eax,dword ptr [temp]  
000000013F8C1A6C  xchg        eax,dword ptr [a]

Does the semantics of `std::memory_order_acquire` requires processor instructions on x86/x86_64?

1 Answers1

Linked