35

GCC 4.4.3 generated the following x86_64 assembly. The part that confuses me is the mov %eax,%eax. Move the register to itself? Why?

   23b6c:       31 c9                   xor    %ecx,%ecx        ; the 0 value for shift
   23b6e:       80 7f 60 00             cmpb   $0x0,0x60(%rdi)  ; is it shifted?
   23b72:       74 03                   je     23b77
   23b74:       8b 4f 64                mov    0x64(%rdi),%ecx  ; is shifted so load shift value to ecx
   23b77:       48 8b 57 38             mov    0x38(%rdi),%rdx  ; map base
   23b7b:       48 03 57 58             add    0x58(%rdi),%rdx  ; plus offset to value
   23b7f:       8b 02                   mov    (%rdx),%eax      ; load map_used value to eax
   23b81:       89 c0                   mov    %eax,%eax        ; then what the heck is this? promotion from uint32 to 64-bit size_t?
   23b83:       48 d3 e0                shl    %cl,%rax         ; shift rax/eax by cl/ecx
   23b86:       c3                      retq   

The C++ code for this function is:

    uint32_t shift = used_is_shifted ? shift_ : 0;
    le_uint32_t le_map_used = *used_p();
    size_t map_used = le_map_used;
    return map_used << shift;

An le_uint32_t is a class which wraps byte-swap operations on big-endian machines. On x86 it does nothing. The used_p() function computes a pointer from the map base + offset and returns a pointer of the correct type.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • 2
    See http://stackoverflow.com/questions/2703394/whats-the-point-of-lea-eax-eax – nos Aug 10 '12 at 23:40
  • @nos: Possibly. But for what reason would GCC want a nop there? There's nothing to align. – Zan Lynx Aug 10 '12 at 23:43
  • Even if there was something to align (a jump somewhere we don't see needed to land on the next instruction), it isn't -- the address of the SHL instruction is only aligned to a byte. This just looks like an optimizer bug. Try different flags and more recent versions of gcc (4.4.3 is getting quite stale) and see what effect it has. – Andy Ross Aug 11 '12 at 04:47
  • 2
    Ah GCC and its infamous random code generator.. that `mov` is absolutely pointless. Even if it was meant for alignment (doesn't look like it) it would be a completely braindead way to accomplish that - it's not a real `nop` so it actually takes time. – harold Aug 11 '12 at 08:03
  • 2
    AFAIR this was a bug in gcc-4.3, it was fixed in 4.6 or so. – Gunther Piez Aug 11 '12 at 13:08
  • 1
    I don't have GCC 4.3.x or 4.4.x, but neither 4.5.4 nor 4.6.3 emit the spurious `mov` for me. – ephemient Aug 11 '12 at 21:14
  • @hirschhornsalz Reference? – Jonathon Reinhart Nov 12 '13 at 05:07
  • 2
    http://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register http://stackoverflow.com/questions/6654098/is-mov-esi-esi-a-no-op-or-not-on-x86-64/6654495#6654495 – phuclv Jan 03 '14 at 14:33

1 Answers1

34

In x86-64, 32-bit instructions implicitly zero-extend: bits 32-63 are cleared (to avoid false dependencies). So sometimes that's why you'll see odd-looking instructions. (Is mov %esi, %esi a no-op or not on x86-64?)

However, in this case the previous mov-load is also 32-bit so the high half of %rax is already cleared. The mov %eax, %eax appears to be redundant, apparently just a GCC missed optimization.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ephemient
  • 198,619
  • 38
  • 280
  • 391
  • this kind of NOP can also serve the purpose of pipeline optimization. – mathk Jul 21 '17 at 13:06
  • iirc MSVC used to add `mov eax,eax` to the start of all functions to allow in-ram hotpatching/`jmp short`, and another 5 wasted bytes right before every function, to allow that `jmp short` to jump to another `jmp long` - that's obviously not what happened here though. – hanshenrik Feb 11 '23 at 23:39