What does cmpq compare?

Question

mystery has this function signature:

int mystery(char *, int);

This is the mystery function assembly code:

mystery:
        movl    $0, %eax                ;set eax to 0
        leaq    (%rdi, %rsi), %rcx      ; rcx = rdi + rsi

loop:
        cmpq    %rdi, %rcx
        jle     endl
        decq    %rcx
        cmpb    $0x65, (%rcx)
        jne     loop
        incl    %eax
        jmp     loop

endl:
        ret

What does this line cmpq %rdi, %rcx compare? The address or the character value? If it is comparing the address stored inside the registers, what's the point though? If one address is greater than the other, so?

It's comparing the address. `rcx` has been set to `start+length` also known as `end`. — Jester, Sep 19 '19 at 00:33

acegs · Answer 1 · 2019-09-19T01:06:20.517

3

it seems it's doing like this:

char* buff = "abcdef"        //this is the rdi.
int64_t len = strlen(buff);  //this is the rsi.

for(char* pRCX = buf+len; pRCX >= buff/*this is the cmpq*/; pRCX--){
    //do something.
}

the cmpq in the code checks if rcx reach the start of the array of data. it decreases on every loop because it started on the last item in the array.

yes, cmpq %rdi, %rcx compares the address. it seems the optimize version of looping through array of characters. instead of looping through index, it directly loop through the address. it's faster this way but a little hard to grasp specially for beginners.

also, i think i read it on agner's books, that looping through series of data starting from the last item and accessing in decreasing order is faster than in increasing order which is typical when coding a loop.

edited Sep 19 '19 at 01:06

answered Sep 19 '19 at 00:33

acegs

2,621
1
22
31

1

lol, this is *not* optimized. But yes, an indexed addressing mode would be slightly worse, I think. Looping downward can be slightly worse because not all the HW prefetchers work in both directions. (But the main L2 streamer does work in either direction on modern Intel CPUs). Anyway, upward isn't really a choice for this search loop; the only sane option is to start at the end since it stops on the first match. – Peter Cordes Sep 19 '19 at 01:29
Looping a counter downward is faster if you use `dec reg` / `jnz .top_of_loop`, but this loop is not doing that: it's using a braindead `jmp` at the bottom of the loop. [Why are loops always compiled into "do...while" style (tail jump)?](//stackoverflow.com/q/47783926). That's what you're remembering in the last paragraph. – Peter Cordes Sep 19 '19 at 01:29

Peter Cordes · Accepted Answer · 2019-09-19T01:19:30.613

Looks like memrchr, with the cmpq checking for the search position getting back to the start of the buffer, and the cmpb checking for a matching byte.

cmp just sets FLAGS according to dst - src, exactly like sub. So it compares its input operands, of course. In this case they're both qword registers holding pointers.

I wouldn't recommend jle for address comparison; better to treat addresses as unsigned. Although for x86-64 it doesn't actually matter; you can't have an array that spans the signed-overflow boundary because the non-canonical "hole" is there. Should pointer comparisons be signed or unsigned in 64-bit x86?

Still, jbe would make more sense. Unless you actually have arrays that span across the boundary from the highest address to the lowest address, so the pointer wraps from 0xfff...fff to 0. But anyway, you could fix this bug by doing if (p == start) break instead of p <= start.

There is a bug in this function though, assuming it's written for the x86-64 System V ABI: its signature takes an int size arg, but it assumes its sign-extended to pointer width when it does char *endp = start + len.

The ABI allows narrow args to have garbage in the high bits of their register. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?

There are also major performance problems with this: checking 1 byte at a time is total garbage vs. SSE2 16 bytes at a time. Also, it doesn't use either conditional branch as the loop branch, so it has 3 jumps per iteration instead of 2. i.e. an extra not-taken conditional branch.

Also, it pointer-subtract after the loop instead of wasting an inc %eax inside the loop. If you're going to do inc %eax inside the loop, you might as well check the size against it instead of the pointer compare.

Anyway, the function is written to be easy to reverse engineer, not to be efficient. The jmp as well as 2 conditional branches makes it worse for that IMO, vs. an idiomatic loop with a condition at the bottom.

looks like `memrchr()` but always looking for 'e' – Tommylee2k Sep 19 '19 at 11:03 — Tommylee2k, Sep 19 '19 at 11:03

What does cmpq compare?

2 Answers2