3

I have 64 bits which I need to read extremely quickly before an event and then after the event perform a compare-and-exchange.

I was thinking I could load(std::memory_order_relaxed) before the event to read quickly and then use the regular compare and exchange after the event.

When I was comparing the assembly between a non-atomic 64 bit read, atomic (relaxed) and atomic (acquire) I could not see any difference in the assembly. This was the C++ test:

int main(){
    volatile uint64_t var2;
    std::atomic<uint64_t> var;  // The variable I wish to read quickly
    var = 10;

    var2 = var.load(std::memory_order_relaxed);
    //var2 = var; // when var is not atomic
    //var2 = var.load(std::memory_order_acquire);  To see if the x86 changed 
}

Giving this assembly:

!int main(){
main()+0: sub    $0x48,%rsp
main()+4: callq  0x100401180 <__main>
!    volatile uint64_t var2;
!    volatile std::atomic<uint64_t> var;
!    var = 10;
!    
!    
!    var2 = var.load(std::memory_order_acquire);
main()()
main()+26: mov    %rax,0x38(%rsp)
!    
!    int x;
!    std::cin >> x;
main()+31: lea    0x2c(%rsp),%rdx
main()+36: mov    0x1f45(%rip),%rcx        # 0x100403050 <__fu0__ZSt3cin>
main()+43: callq  0x100401160 <_ZNSirsERi>
!}main()+48: mov    $0x0,%eax
main()+53: add    $0x48,%rsp
main()+57: retq  

Obviously the assembly for using std::memory_order_acquire should be different to a non-atomic variable read?

Is this because reading 64 bits is atomic regardless, so long as the data is aligned, hence the assembly is not different? I would have thought using a stronger memory barrier would have inserted a fence instruction or something?

The real question I want to know is, if I declare the 64 bits as atomic and I read using a relaxed memory barrier, will it have the same performance cost as reading an unatomic 64-bit variable?

user997112
  • 29,025
  • 43
  • 182
  • 361
  • 2
    x86 processors use a fairly strong memory ordering. I believe normal loads and stores have acquire and release semantics, so no fence or other special instructions would be needed. Other CPUs might be quite different. In particular, some might have to use multiple instructions to read a 64-bit value (eg. 32-bit CPUs), and so would need additional synchronization instructions to read it atomically. – Ross Ridge Jul 03 '15 at 23:24
  • This question is very confused. There is no such thing as a "relaxed memory barrier". C++ defines *memory ordering on atomic operations* which is specific to one memory location, and separately (and almost uselessly) *memory fences*, which you never use nor should use. – Kerrek SB Jul 04 '15 at 11:23
  • It defeats some optimizations like auto-vectorization. Compilers probably handle it internally like `volatile` (for now). See [Why don't compilers merge redundant std::atomic writes?](https://stackoverflow.com/q/45960387) – Peter Cordes Jul 27 '22 at 00:50

1 Answers1

0

if I declare the 64 bits as atomic and I read using a relaxed memory barrier, will it have the same performance cost as reading an unatomic 64-bit variable?

Yes, they have same cost in terms of memory barriers/fences.

On x86_64 every load has aqcuire semantic, so load with acquire semantic is same as one with relaxed semantic from the point of view of processor. But they are different from the point of view of compiler(relaxed operations can be reordered with others).

Tsyvarev
  • 60,011
  • 17
  • 110
  • 153
  • Hi there, thanks for the answer! You mention stores, what about loads? (as my question relates to "read"ing 64-bits). Thanks :) – user997112 Jul 03 '15 at 23:34
  • @user997112, anything other than loading (storing it for use in an expression, changing it, even just incrementing it) -- **nothing else** is atomic! Be very careful with data you're intentionally leaving lock-free. – Blindy Jul 04 '15 at 00:35
  • 1
    This answer is not necessarily true for RISC 64b architectures. – Jeff Hammond Jul 04 '15 at 03:43