I have 64 bits which I need to read extremely quickly before an event and then after the event perform a compare-and-exchange.
I was thinking I could load(std::memory_order_relaxed
) before the event to read quickly and then use the regular compare and exchange after the event.
When I was comparing the assembly between a non-atomic 64 bit read, atomic (relaxed) and atomic (acquire) I could not see any difference in the assembly. This was the C++ test:
int main(){
volatile uint64_t var2;
std::atomic<uint64_t> var; // The variable I wish to read quickly
var = 10;
var2 = var.load(std::memory_order_relaxed);
//var2 = var; // when var is not atomic
//var2 = var.load(std::memory_order_acquire); To see if the x86 changed
}
Giving this assembly:
!int main(){
main()+0: sub $0x48,%rsp
main()+4: callq 0x100401180 <__main>
! volatile uint64_t var2;
! volatile std::atomic<uint64_t> var;
! var = 10;
!
!
! var2 = var.load(std::memory_order_acquire);
main()()
main()+26: mov %rax,0x38(%rsp)
!
! int x;
! std::cin >> x;
main()+31: lea 0x2c(%rsp),%rdx
main()+36: mov 0x1f45(%rip),%rcx # 0x100403050 <__fu0__ZSt3cin>
main()+43: callq 0x100401160 <_ZNSirsERi>
!}main()+48: mov $0x0,%eax
main()+53: add $0x48,%rsp
main()+57: retq
Obviously the assembly for using std::memory_order_acquire
should be different to a non-atomic variable read?
Is this because reading 64 bits is atomic regardless, so long as the data is aligned, hence the assembly is not different? I would have thought using a stronger memory barrier would have inserted a fence instruction or something?
The real question I want to know is, if I declare the 64 bits as atomic and I read using a relaxed memory barrier, will it have the same performance cost as reading an unatomic 64-bit variable?