0

I'm having trouble understanding how the following equivalents work:

x86-64:

/*long loop(long x, int n)*/
/*x in %rdi, n in %esi*/
1.loop: 
2    movl %esi, %ecx 
3    movl $1, %edx 
4    movl $0, %eax 
5    jmp .L2 
6.L3: 
7    movq %rdi,%r8 
8    andq %rdx,%r8 
9    orq %r8,%rax 
10   salq %cl,%rdx 
11.L2 
12   testq %rdx,%rdx 
13   jne .L3 
14   rep; ret

C:

1 long loop(long x, int n)
2 {
3    long result = 0​;
4    long mask;
5    for (mask = 1​; mask != 0​; mask = mask << n​) {
6       result |= (x & mask)​;
7    }
8 return result;
9 }

From what I see,

  1. n = %esi and is copied into %ecx.
  2. 1 is copied into mask.
  3. 0 is copied into result.

I would like to know why 1 is copied to mask when the first variable in the C code is result? Wouldn't result = 1 and mask = 0 since that is the correct order in the C program? Furthermore, when I convert the C code to assembly language, I get:

1.loop: 
2    movl %rsi, %rcx 
3    movl $1, %eax 
4    movl $0, %edx 
5    jmp .L2
     ...

So are the registers %eax and %edx interchangeable?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Michael
  • 21
  • 3
  • 1
    Registers are interchangeable as far as using them for your own purposes within functions (except for a few cases like shift counts having to be in CL, if you're not using BMI2 `shlx` or whatever). But in the standard calling conventions, `long` is returned in RAX; RDX is only used as the upper half of a 128-bit return value, if at all. The order of independent instructions is not significant, just like `int x=1, y=2;` in C is not different from `int y=2, x=1;` – Peter Cordes Oct 02 '22 at 06:10
  • I assume you were compiling with `gcc -O1` or `-Og` to get that compiler output? Other compilers would either use `xor %eax,%eax` for zeroing a register, or would be full debug mode spilling everything to memory between statements. – Peter Cordes Oct 02 '22 at 06:15
  • @PeterCordes How would I know which variable is being assigned the value? – Michael Oct 02 '22 at 06:15
  • By looking at what value is assigned, and how that register is later used. In your example, only one variable is initialized to zero, and that's `result`. The other, `mask` is assigned `1` before entering the loop. If you were actually reverse engineering without having the C source to look at, you'd have to make up those variable names yourself. – Peter Cordes Oct 02 '22 at 06:20
  • @PeterCordes But say I had the C code without the values being assigned to the variables. How would I make the distinction between which variable is being assigned to 1 and 0? – Michael Oct 02 '22 at 06:26
  • It's obvious for me when I look at `test q %rdx, %rdx` , but looking at just the loop bit of code, it's not as intuitive. – Michael Oct 02 '22 at 06:44
  • Look at how the registers get used. RAX at the end of the function is the return value. Also, one register is the destination of an OR instruction, the other is modified by an left-shift. Exactly like you've identified that `mask != 0` is done by the `test`/`jnz` at the bottom of an asm loop. ([Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926)) – Peter Cordes Oct 02 '22 at 06:48
  • 1
    Not directly related to your question, but by left-shifting a signed value until all bits have fallen off the edge, I think you are veering into [undefined-behaviour territory](https://stackoverflow.com/questions/3784996/why-does-left-shift-operation-invoke-undefined-behaviour-when-the-left-side-oper). – Ture Pålsson Oct 02 '22 at 07:25
  • @TurePålsson How can I prevent that? – Michael Oct 02 '22 at 08:06
  • Declare `mask` as `unsigned long`, and make sure that `n` is not larger than the number of bits in an unsigned long. [This answer](https://stackoverflow.com/a/57618041/4177009) has more details. – Ture Pålsson Oct 02 '22 at 08:14

0 Answers0