Why is gcc using two stores (`MOV %reg, (mem)`) instead of just one?

Question

Compiling the following with gcc (version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0):

if(n >= m)
{
     n = 0;
}

The code output looks like this:

23bd5:       48 3b 93 10 03 00 00    cmp    0x310(%rbx),%rdx
23bdc:       48 89 93 20 03 00 00    mov    %rdx,0x320(%rbx)
23be3:       72 0d                   jb     23bf2
23be5:       48 c7 83 20 03 00 00    movq   $0x0,0x320(%rbx)
23bec:       00 00 00 00 
23bf0:       31 d2                   xor    %edx,%edx
23bf2:       ....

I'm wondering why the compiler decides to use what looks like a rather expensive set of instructions. I would have used the following instead:

    cmp 0x310(%rbx), %rdx
    jb  .L1
    xor %edx, %edx
.L1
    mov $rdx, 0x320(%rbx)

Could it be because the XOR takes so much time that it would stale before the store? Or is it that the first store doesn't really happen if the processor detects the second store? Or is it that the second store is nearly instant because it will use the L1 cache (presumably)?

(that being said, the store could happen later as other instructions coming after could safely be moved before that store).

Can you give a complete test case, and the options you are passing to gcc? I don't see such code in a [simple example](https://godbolt.org/z/ajonYj). — Nate Eldredge, Nov 26 '20 at 05:34
seems `mov %rdx,0x320(%rbx)` is result of instruction reordering from other part of the function. — fukanchik, Nov 26 '20 at 05:35
xor-zeroing takes [literally zero cycles of latency](https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and) from issue/rename to the result being ready on Sandybridge-family, so no that's not the reason. Or like 1 cycle on CPUs that don't eliminate it at register-rename. But it's still independent work that can be executed out-of-order (if branch prediction is correct). Your version is likely better, and would clearly be a missed optimization with `-march=skylake` or something. — Peter Cordes, Nov 26 '20 at 06:43
@NateEldredge I'll try to get that. There may be a register exhaustion happening as the preceding code is "heavy" already. — Alexis Wilke, Nov 26 '20 at 14:37

Why is gcc using two stores (`MOV %reg, (mem)`) instead of just one?

0 Answers0