I wrote this simple C++ code, to see how atomic variables are implemented.
#include <atomic>
using namespace std;
atomic<float> f(0);
int main() {
f += 1.0;
}
It is generating this assembly for main in -O3:
main:
mov eax, DWORD PTR f[rip]
movss xmm1, DWORD PTR .LC1[rip]
movd xmm0, eax
mov DWORD PTR [rsp-4], eax ; this line is redundant
addss xmm0, xmm1
.L2:
mov eax, DWORD PTR [rsp-4] ; this line is redundant
movd edx, xmm0
lock cmpxchg DWORD PTR f[rip], edx
je .L5
mov DWORD PTR [rsp-4], eax ; this line is redundant
movss xmm0, DWORD PTR [rsp-4] ; this line can become movd xmm0, eax
addss xmm0, xmm1
jmp .L2
.L5:
xor eax, eax
ret
f:
.zero 4
.LC1:
.long 1065353216
It is using the atomic compare and exchange technique to achieve atomicity. But there, the old value is being stored in the stack at [rsp-4]. But in the above code, eax is invariant. So the old value is preserved in eax itself. Why is the compiler allocating additional space for the old value? Even in -O3!! Is there any specific reason to store that variable in the stack rather than in registers?
EDIT: Logical deduction -
There are 4 lines that use rsp-4
-
mov DWORD PTR [rsp-4], eax --- 1
mov eax, DWORD PTR [rsp-4] --- 2 <--.
mov DWORD PTR [rsp-4], eax --- 3 | loop
movss xmm0, DWORD PTR [rsp-4] --- 4 ---'
Lines 3 and 4 have absolutely nothing else in-between, and hence 4 can be re written using 3 as
movd xmm0, eax
.
Now, when going from line 3 to 2 in the loop, there is no modification to rsp-4
(nor eax
). So it implies that lines 3 and 2 in sequence together collapse to
mov eax, eax
which is redundant by nature.
Finally, only line 1 remains, whose destination is never used again. So it is also redundant.