0

I am learning assembly, and I have found the following assembly code for binary research.

bsearch:
        xorl    %r8d, %r8d
.L4:
        movl    %esi, %eax
        subl    %r8d, %eax
        testl   %eax, %eax
        jle     .L9
.L6:
        sarl    %eax
        addl    $1, %eax
        movslq  %eax, %rcx
        movl    (%rdi,%rcx,4), %ecx
        cmpl    %edx, %ecx
        jle     .L3
        movl    %eax, %esi
        movl    %esi, %eax
        subl    %r8d, %eax
        testl   %eax, %eax
        jg      .L6
.L9:
        movl    $-1, %eax
.L5:
        rep ret
.L3:
        cmpl    %ecx, %edx
        jle     .L5
        movl    %eax, %r8d
        jmp     .L4

The above code is generated on goldbolt. In tag .L6, there are two lines:

        movl    %eax, %esi
        movl    %esi, %eax

I am confused about what is going on here. What is the point of moving %eax to %esi then move it back

jeffma
  • 11
  • 4
  • 1
    Your godbolt link doesn't link to an actual project, just the godbolt home page. Get the link from the Share pulldown. – David Wohlferd Jul 20 '22 at 20:33
  • The code looks bizarre in several aspects (e.g. the first `test` looks superfluous). Are you sure it's compiled with optimizations enabled? – Matteo Italia Jul 20 '22 at 20:41
  • There is no point, that's a missed optimization (if this asm is for real). There's no branch target in between those instructions, so the compiler can be sure that the value is still in EAX and doesn't need to move it back. And that the following `sub` will zero-extend it to 64-bit, although it already was zero-extended at that point. – Peter Cordes Jul 20 '22 at 22:18
  • 1
    This does look like GCC output though, including stuff like `rep ret` (so GCC from a few years ago or `-mtune=k10`), so I wonder if it was compiled with `-Og` or some other minimal optimization. It doesn't spill/reload locals to the stack, so not `-O0`, unless all the locals were declared `register int *p` or whatever. But I'd have guessed `-O0` would be even dumber. Anyway, without a GCC version and [mcve] source, this is just idle guessing. But with that, you could report a missed-optimization bug on https://gcc.gnu.org/bugzilla/enter_bug.cgi if it still happens with gcc nightly `-O2`. – Peter Cordes Jul 20 '22 at 22:19
  • I noticed this when I was trying different optimizations to see the actual assembly code. I was using O3 specifically here, and I have updated the link. – jeffma Jul 20 '22 at 22:55
  • https://godbolt.org/z/5TshKEr38 shows GCC12 -O3 making more compact asm. Interestingly, it does have `movl %eax, %esi` / `.L3: movl %esi, %eax`, where `.L3` is the loop entry point. Perhaps GCC4.8 was going to do something like that but changed its mind at the last minute, peeling part of the loop. But then failing to optimize after there was no branch target inside the loop. (And GCC12 should have hoisted the `movl %esi, %eax` out of the loop since it's only useful for the first iteration.) – Peter Cordes Jul 20 '22 at 23:15
  • 1
    @MatteoItalia: The `test` isn't quite redundant. After `sub`, `js` / `jns` could jump on the result being `>=0` or `<0`, but this is jumping on `sub_result <= 0` or `>0`. Perhaps it could be re-arranged, but `test/jg` can macro-fuse on most modern x86, while `sub/js` can't even on Sandybridge ([x86\_64 - Assembly - loop conditions and out of order](https://stackoverflow.com/q/31771526)). IIRC, AMD can fuse any test or cmp/jcc pair, but not `sub`. Only operations that don't have an integer register output. – Peter Cordes Jul 20 '22 at 23:16

0 Answers0