Assembly Code - How does xor in a loop works?

Question

I have this assembly code that has a for loop in it which I am to change back into C code. However, I notice there's a xor in the loop.

.L3:
        movq    -8(%rbp), %rax      
        andl    $1, %eax    
        xorl    %eax, -12(%rbp)     
        sarq    -8(%rbp)        
.L2:
        cmpq    $0, -8(%rbp)        
        jg      .L3

So I know that a for loop will constantly loop as long as it is greater than 0 and divides by 2 every loop. But what I am having trouble is with the andl and xorl. I know it checks 1 and eax with and and returns 1 or 0 depending on their values, but how will the xor be changed by the loop?

[logic gate AND/OR/XOR/NOT/...](http://whatis.techtarget.com/definition/logic-gate-AND-OR-XOR-NOT-NAND-NOR-and-XNOR) ... so on x86 `mov al,0b1010` `xor al,0b0011` will result into `al` = `0b1001`. ... `and al,0b0011` instead of xor would produce `al` = `0b0010`. And `or al,0b0011` would produce `al` = `0b1011`. `not al` would end with `al` = `0b11110101` (I didn't write those higher 4 bit zeroes in other examples, but after `not` they are inverted too) (write the values above each other for better view and compare it with the "gate" article) — Ped7g, Sep 12 '16 at 10:37
Eww, why all the memory operands? I don't see a pointer-increment of RBP happening, so IDK why that code doesn't just use registers for everything. I assume that came from `gcc -O0` :( — Peter Cordes, Sep 12 '16 at 13:54

Crowman · Accepted Answer · 2016-09-12T02:52:58.060

3

Let's say local variable b is at -8(%rbp), and local variable c is at -12(%rbp).

.L3:    movq    -8(%rbp), %rax
        andl    $1, %eax

Set the value of eax to the value of the least significant bit of b.

xorl    %eax, -12(%rbp)

Perform an exclusive-or with c and the least significant bit of b, storing the result in c.

sarq    -8(%rbp)

Divide b by 2.

cmpq    $0, -8(%rbp)
jg      .L3

Go back to start of loop if b is greater than 0, otherwise continue.

So the corresponding C code is:

do {
    c ^= (b & 1);
    b /= 2;           //  Or: b >>= 1;
} while ( b > 0 );

although the existence of the .L2 label suggests there may be a jmp .L2 immediately before, which you're not showing us, in which case it would be a while loop:

while ( b > 0 ) {
    c ^= (b & 1);
    b /= 2;           //  Or: b >>= 1;
}

A working demonstration (using gas assembler on OS X):

asm_func.S:

.globl  _asm_func

.text

_asm_func:
    push    %rbp
    mov     %rsp, %rbp
    sub     $16, %rsp

    movq    %rdi, -8(%rbp)
    movl    %esi, -12(%rbp)

    jmp     .L2

.L3:
    movq    -8(%rbp), %rax
    andl    $1, %eax
    xorl    %eax, -12(%rbp)
    sarq    -8(%rbp)

.L2:
    cmpq    $0, -8(%rbp)
    jg      .L3

    movl    -12(%rbp), %eax

    leave
    ret

main.c:

#include <stdio.h>

int asm_func(int b, int c);

int c_func(int b, int c)
{
    while ( b > 0 ) {
        c ^= (b & 1);
        b >>= 1;
    }
    return c;
}

int main(void)
{
    for ( int i = 112; i < 127; i += 7 ) {
        for ( int j = 203; j > 182; j -= 9 ) {
            printf("C function  (%d, %d): %d\n", i, j, c_func(i, j));
            printf("Asm function(%d, %d): %d\n", i, j, asm_func(i, j));
        }
    }
    return 0;
}

Makefile:

prog: main.o asm_func.o
    cc -o prog main.o asm_func.o

main.o: main.c
    cc -o main.o main.c -c -std=c99 -pedantic -Wall -Wextra

asm_func.o: asm_func.S
    as -o asm_func.o asm_func.S

with output:

paul@horus:~/src/sandbox/so_asm$ ./prog
C function  (112, 203): 202
Asm function(112, 203): 202
C function  (112, 194): 195
Asm function(112, 194): 195
C function  (112, 185): 184
Asm function(112, 185): 184
C function  (119, 203): 203
Asm function(119, 203): 203
C function  (119, 194): 194
Asm function(119, 194): 194
C function  (119, 185): 185
Asm function(119, 185): 185
C function  (126, 203): 203
Asm function(126, 203): 203
C function  (126, 194): 194
Asm function(126, 194): 194
C function  (126, 185): 185
Asm function(126, 185): 185
paul@horus:~/src/sandbox/so_asm$

edited Sep 12 '16 at 02:52

answered Sep 12 '16 at 02:04

Crowman

25,242
5
48
56

I was told not to include unnecessary code so I didn't include the beginning. But yea there is a call to jump to L2 in main. how does the b%1 work? just compare the smallest bit? – Nukodi Sep 12 '16 at 02:35
`b & 1` is just a regular bitwise AND. It evaluates to `1` is `b` is odd, and to `0` if `b` is even. – Crowman Sep 12 '16 at 02:53
Thanks for the example! – Nukodi Sep 12 '16 at 02:58
@JMei: BTW, this looks like a parity calculation: horizontal XOR of all the bits. See [this Q&A](http://stackoverflow.com/questions/38886479/whats-the-purpose-of-looping-xorl-edx-eax-shrl-1-edx) for how to more efficiently calculate parity by XORing upper and lower halves down to one byte. Repeat down to one bit, or just read PF, since x86 already computes parity. – Peter Cordes Sep 12 '16 at 14:02
Wow, I just noticed that this function uses signed integer comparison. That's super-weird, and makes it return something other than the parity for integers with the high bit (sign bit) set. I wonder if that was intentional in the original C source, to make reverse-engineering this more tricky? – Peter Cordes Sep 12 '16 at 14:05
Also, this function probably makes more sense if `-12(%rbp)` is a local initialized to zero, not another function arg. The OP only shows the loop, not the initialization. Only the low bit is ever modified, and that would make it a normal parity calculation instead of an arg to XOR the parity into. @JMei: I'd say the beginning is necessary, to figure out which stack locations come from function args and which are locals. – Peter Cordes Sep 12 '16 at 14:10
@PeterCordes : you're probably right, I set up the function to pass two arguments just to be able to demonstrate that the C function produces the same output as the snippet of assembly in the question, with various inputs, although I note now I didn't include any negative numbers in the test inputs. – Crowman Sep 12 '16 at 15:53
@PaulGriffiths: negative inputs is a separate issue from whether the parity accumulator is initialized with a constant or a function arg. The original function is just silly to use an arithmetic shift and signed check, and you correctly decompiled that. You did get the type of `b` wrong, though: It's a 64-bit signed type, so either `long`, `long long`, or `int64_t`. – Peter Cordes Sep 12 '16 at 17:13
I tried and failed to reproduce the exact asm in the loop with various gcc versions on the Godbolt compiler explorer, in a function with one arg. It looks like 2 args is the key to getting -12 and -8 offsets, at least with the compilers I tried: https://godbolt.org/g/nHztC5. And this is definitely `gcc -O0 -S` output, because it uses local labels like `.L2`, but clang uses labels like `.LBB0_1:`. The BB almost certainly stands for Basic Block. icc13 also uses a different naming pattern for local labels. – Peter Cordes Sep 12 '16 at 17:14
Anyway, still a duplicate question, IMO. Even if there is an extra function arg, that's not really significant since the OP didn't ask about that part of the function by including the asm for it. This answer is probably a useful complement to my answer on the other question, but I did just notice another mistake: `b>>=1` doesn't quite do the same thing as `b /= 2` for signed `b`, with C semantics for division (which the x86 `idiv` instruction mirrors). – Peter Cordes Sep 12 '16 at 17:18
[SAR rounds towards -Infinity](http://www.felixcloutier.com/x86/SAL:SAR:SHL:SHR.html). e.g. `-3 SAR 1 = -2`. But IDIV (and C division) truncate towards 0: `-3 / 2 == -1`. I know you're just trying to explain what the right shift is doing, and you used `>>=1` because unoptimized compiler output for `/=2` actually uses `idiv`, but the OP's code (with SAR) wouldn't be legal compiler output for `/=2` so you should probably avoid showing that in code at least. – Peter Cordes Sep 12 '16 at 17:29
@PeterCordes : On the type, a 32 bit integer is going to get sign-extended and passed in a 64 bit register anyway, of course, and the function only ever makes the number smaller, so I don't think it necessarily implies an explicit 64 bit C type, but it certainly could be the case. I'll take a look at the shift/division issue when I'm at my computer, and update the answer if necessary, but I think you're correct about it. – Crowman Sep 12 '16 at 17:41
@PaulGriffiths: A function that accepts a 32-bit integer function arg has to assume the upper 32 bits of the register hold garbage. Writing a 32-bit reg in x86-64 zeros the upper 32 for free (not sign-extends), but the ABI doesn't even require that. The x86-64 SysV ABI doesn't require zero or sign extension to 64 bits. Also, if you look at the asm, gcc emits SARL and CMPL for `int`, not SARQ and CMPQ. The ABI unofficially does require sign-extension or zero extension to 32-bit for narrow types (like int16_t), and [clang even depends on this.](http://stackoverflow.com/a/36760539/224132) – Peter Cordes Sep 12 '16 at 18:00

Assembly Code - How does xor in a loop works?

1 Answers1