some_atomic.load(std::memory_order_acquire) does just drop through to a simple load instruction, and some_atomic.store(std::memory_order_release) drops through to a simple store instruction.
It is known that on x86 for the operations load()
and store()
memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel
does not require a processor instructions.
But on ARMv8 we known that here are memory barriers both for load()
and store()
:
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2
About different architectures of CPUs: http://g.oswego.edu/dl/jmm/cookbook.html
Next, but for the CAS-operation on x86, these two lines with different memory barriers are identical in Disassembly code (MSVS2012 x86_64):
a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
000000013FE71A2D mov ebx,dword ptr [temp]
000000013FE71A31 mov eax,ebx
000000013FE71A33 mov ecx,4
000000013FE71A38 lock cmpxchg dword ptr [temp],ecx
a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);
000000013FE71A4D mov ecx,5
000000013FE71A52 mov eax,ebx
000000013FE71A54 lock cmpxchg dword ptr [temp],ecx
Disassembly code compiled by GCC 4.8.1 x86_64 - GDB:
a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);
0x4613b7 <+0x0027> mov 0x2c(%rsp),%eax
0x4613bb <+0x002b> mov $0x4,%edx
0x4613c0 <+0x0030> lock cmpxchg %edx,0x20(%rsp)
0x4613c6 <+0x0036> mov %eax,0x2c(%rsp)
0x4613ca <+0x003a> lock cmpxchg %edx,0x20(%rsp)
Is on x86/x86_64 platforms for any atomic CAS-operations, an example such like this atomic_val.compare_exchange_weak(temp, 1, std::memory_order_relaxed, std::memory_order_relaxed);
always satisfied with the ordering std::memory_order_seq_cst
?
And if the any CAS operation on the x86 always run with sequential consistency (std::memory_order_seq_cst
) regardless of barriers, then on the ARMv8 it is the same?
QUESTION: Should the order of std::memory_order_relaxed
for CAS
block memory bus on x86 or ARM?
ANSWER: On x86 any compare_exchange_weak()
operations with any std::memory_orders
(even std::memory_order_relaxed
) always translates to the LOCK CMPXCHG
with lock bus, to be really atomic, and have equal expensive to XCHG
- "the cmpxchg
is just as expensive as the xchg
instruction".
(An addition: XCHG
equal to LOCK XCHG
, but CMPXCHG
doesn't equal to LOCK CMPXCHG
(which is really atomic)
On ARM and PowerPC for any`compare_exchange_weak() for different std::memory_orders there are differents lock's processor instructions, through LL/SC.
Processor memory-barriers-instructions for x86(except CAS), ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html