I don't know what cmpq %rsi, %rdi
mean
That's AT&T syntax for cmp rdi, rsi
. https://www.felixcloutier.com/x86/CMP.html
You can look up the details of what a single instruction does in an ISA manual.
More importantly, cmp
/jcc
like cmp %rsi,%rdi
/jl
is like jump if rdi<rsi
.
Assembly - JG/JNLE/JL/JNGE after CMP. If you go through all the details of how cmp
sets flags, and which flags each jcc
condition checks, you can verify that it's correct, but it's much easier to just use the semantic meaning of JL = Jump on Less-than (assuming flags were set by a cmp
) to remember what they do.
(It's reversed because of AT&T syntax; jcc
predicates have the right semantic meaning for Intel syntax. This is one of the major reasons I usually prefer Intel syntax, but you can get used to AT&T syntax.)
From the use of rdi
and rsi
as inputs (reading them without / before writing them), they're the arg-passing registers. So this is the x86-64 System V calling convention, where integer args are passed in RDI, RSI, RDX, RCX, R8, R9, then on the stack. (What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 covers function calls as well as system calls). The other major x86-64 calling convention is Windows x64, which passes the first 2 args in RCX and RDX (if they're both integer types).
So yes, x=RDI and y=RSI. And yes, result=RAX. (writing to EAX zero-extends into RAX).
From the code structure (not storing/reloading every C variable to memory between statements), it's compiled with some level of optimization enabled, so the for()
loop turned into a normal asm loop with the conditional branch at the bottom. Why are loops always compiled into "do...while" style (tail jump)? (@BrianWalker's answer shows the asm loop transliterated back to C, with no attempt to form it back into an idiomatic for
loop.)
From the cmp/jcc ahead of the loop, we can tell that the compiler can't prove the loop runs a non-zero number of iterations. So whatever the for()
loop condition is, it might be false the first time. (That's unsurprising given signed integers.)
Since we don't see a separate register being used for i
, we can conclude that optimization reused another var's register for i
. Like probably for(i=x;
, and then with the original value of x
being unused for the rest of the function, it's "dead" and the compiler can just use RDI as i
, destroying the original value of x
.
I guessed i=x
instead of y
because RDI is the arg register that's modified inside the loop. We expect that the C source modifies i
and result
inside the loop, and presumably doesn't modify it's input variables x
and y
. It would make no sense to do i=y
and then do stuff like x--
, although that would be another valid way of decompiling.
cmp %rdi, %rsi
/ jl .L3
means the loop condition to (re)enter the loop is rsi-rdi < 0
(signed), or i<y
.
The cmp/jcc before the loop is checking the opposite condition; notice that the operands are reversed and it's checking jle
, i.e. jng
. So that makes sense, it really is same loop condition peeled out of the loop and implemented differently. Thus it's compatible with the C source being a plain for()
loop with one condition.
sub $1, %rdi
is obviously i--
or --i
. We can do that inside the for()
, or at the bottom of the loop body. The simplest and most idiomatic place to put it is in the 3rd section of the for(;;)
statement.
addq %rdi, %rax
is obviously adding i
to result
. We already know what RDI and RAX are in this function.
Putting the pieces together, we arrive at:
long foo(long x, long y)
{
long i, result = 0;
for (i= x ; i>y ; i-- ){
result += i;
}
return result;
}
Which compiler made this code?
From the .L3:
label names, this looks like output from gcc
. (Which somehow got corrupted, removing the :
from .L2
, and more importantly removing the %
from %rsi
in one cmp. Make sure you copy/paste code into SO questions to avoid this.)
So it may be possible with the right gcc version/options to get exactly this asm back out for some C input. It's probably gcc -O1
, because movl $0, %eax
rules out -O2
and higher (where GCC would look for the xor %eax,%eax
peephole optimization for zeroing a register efficiently). But it's not -O0
because that would be storing/reloading the loop counter to memory. And -Og
(optimize a bit, for debugging) likes to use a jmp
to the loop condition instead of a separate cmp/jcc
to skip the loop. This level of detail is basically irrelevant for simply decompiling to C that does the same thing.
The rep ret
is another sign of gcc; gcc7 and earlier used this in their default tune=generic
output for ret
that's reached as a branch target or a fall-through from a jcc
, because of AMD K8/K10 branch prediction. What does `rep ret` mean?
gcc8 and later will still use it with -mtune=k8
or -mtune=barcelona
. But we can rule that out because that tuning option would use dec %rdi
instead of subq $1, %rdi
. (Only a few modern CPUs have any problems with inc/dec
leaving CF unmodified, for register operands. INC instruction vs ADD 1: Does it matter?)
gcc4.8 and later put rep ret
on the same line. gcc4.7 and earlier print it as you've shown, with the rep
prefix on the line before.
gcc4.7 and later like to put the initial branch before the mov $0, %eax
, which looks like a missed optimization. It means they need a separate return 0
path out of the function, which contains another mov $0, %eax
.
gcc4.6.4 -O1
reproduces your output exactly, for the source shown above, on the Godbolt compiler explorer
# compiled with gcc4.6.4 -O1 -fverbose-asm
foo:
movl $0, %eax #, result
cmpq %rsi, %rdi # y, x
jle .L2 #,
.L3:
addq %rdi, %rax # i, result
subq $1, %rdi #, i
cmpq %rdi, %rsi # i, y
jl .L3 #,
.L2:
rep
ret
So does this other version which uses i=y
. Of course there are many things we could add that would optimize away, like maybe i=y+1
and then having a loop condition like x>--i
. (Signed overflow is undefined behaviour in C, so the compiler can assume it doesn't happen.)
// also the same asm output, using i=y but modifying x in the loop.
long foo2(long x, long y) {
long i, result = 0;
for (i= y ; x>i ; x-- ){
result += x;
}
return result;
}
In practice the way I actually reversed this:
- I copy/pasted the C template into Godbolt (https://godbolt.org/). I could see right away (from the
mov $0
instead of xor-zero, and from the label names) that it looked like gcc -O1
output, so I put in that command line option and picked an old-ish version of gcc like gcc6. (Turns out this asm was actually from a much older gcc).
I tried an initial guess like x<y
based on the cmp/jcc, and i++
(before I'd actually read the rest of the asm carefully at all), because for loops often use i++
. The trivial-looking infinite-loop asm output showed me that was obviously wrong :P
I guessed that i=x, but after taking a wrong turn with a version that did result += x
but i--
, I realized that i
was a distraction and at first simplified by not using i
at all. I just used x--
while first reversing it because obviously RDI=x. (I know the x86-64 System V calling convention well enough to see that instantly.)
After looking at the loop body, the result += x
and x--
were totally obvious from the add
and sub
instructions.
cmp/jl
was obviously a something < something
loop condition involving the 2 input vars.
I wasn't sure I if it was x<y
or y<x
, and newer gcc versions were using jne
as the loop condition. I think at that point I cheated and looked at Brian's answer to check it really was x > y
, instead of taking a minute to work through the actual logic. But once I had figured out it was x--
, only x>y
made sense. The other one would be true until wraparound if it entered the loop at all, but signed overflow is undefined behaviour in C.
Then I looked at some older gcc versions to see if any made asm more like in the question.
Then I went back and replaced x
with i
inside the loop.
If this seems kind of haphazard and slapdash, that's because this loop is so tiny that I didn't expect to have any trouble figuring it out, and I was more interested in finding source + gcc version that exactly reproduced it, rather than the original problem of just reversing it at all.
(I'm not saying beginners should find it that easy, I'm just documenting my thought process in case anyone's curious.)