Compiling the following with gcc (version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0):
if(n >= m)
{
n = 0;
}
The code output looks like this:
23bd5: 48 3b 93 10 03 00 00 cmp 0x310(%rbx),%rdx
23bdc: 48 89 93 20 03 00 00 mov %rdx,0x320(%rbx)
23be3: 72 0d jb 23bf2
23be5: 48 c7 83 20 03 00 00 movq $0x0,0x320(%rbx)
23bec: 00 00 00 00
23bf0: 31 d2 xor %edx,%edx
23bf2: ....
I'm wondering why the compiler decides to use what looks like a rather expensive set of instructions. I would have used the following instead:
cmp 0x310(%rbx), %rdx
jb .L1
xor %edx, %edx
.L1
mov $rdx, 0x320(%rbx)
Could it be because the XOR takes so much time that it would stale before the store? Or is it that the first store doesn't really happen if the processor detects the second store? Or is it that the second store is nearly instant because it will use the L1 cache (presumably)?
(that being said, the store could happen later as other instructions coming after could safely be moved before that store).