-1

I am having difficulty with reverse engineering this assembly code to deduce the values of the array's dimensions.

I am given

struct vec3 {
  long z;
  int x;
  unsigned short y;

};

struct vec3 array1[2][A];
struct vec3 array2[8][B];
int arrayfunc(int i1, int j1, int i2, int j2){
   return array1[i1][j1].x  + array1[i1][j1].y - array2[i2][j2].y;
}

This is the C code provided and the types of the member data x,y,z were unknown but this is what I deduced them to be.

arrayfunc:
    leaq    array1(%rip), %rax
    movslq  %ecx, %rcx
    movslq  %edx, %r10
    movslq  %r9d, %r9
    leaq    (%rcx,%rcx,2), %rdx
    movslq  %r8d, %r8
    movq    %rax, %rcx
    addq    %r10, %rdx
    salq    $4, %rdx
    movzwl  12(%rax,%rdx), %eax
    addl    8(%rcx,%rdx), %eax
    leaq    (%r9,%r8,2), %rdx
    leaq    array2(%rip), %rcx
    salq    $4, %rdx
    movzwl  12(%rcx,%rdx), %edx
    subl    %edx, %eax
    ret    

The issue here is that I am not sure how I can find the values of A and B from the assembly code.

Any and all help is always appreciated :)

Thanks :))

Megan Darcy
  • 530
  • 5
  • 15
  • Related: [Nested Arrays in Assembly, reach desired index](https://stackoverflow.com/q/72467662) / [Finding P and Q in assembly matrix](https://stackoverflow.com/q/74321151) – Peter Cordes Nov 04 '22 at 19:28

1 Answers1

2

Indexing a 2D array has to scale the first index by sizeof(struct vec3[A]): array1 is an array of arrays, and each smaller array has A elements. So you look at the asm and see what it's multiplying by.

Given, struct vec3 array1[2][A];,
array1[i1][j1].x is the same address math as for a flat 1D array: array1[ (i1*A) + j1 ].x. And in C, we index by element not bytes, so the asm also has to scale by sizeof(struct vec3). That's clearly what the sal $4, %reg instructions are doing, because after padding for alignment the struct size is 16 bytes.

Notice that the leading dimension [2] doesn't come into the calculation at all; that just tells you how much total space you have. It's the later dimensions that set the geometry; the stride between the the same column in different rows.


If you don't already see how that C would compile for different A and B values, try it with some sample ones and see what changes when you increase A or B by 1. https://godbolt.org/ is ideal for playing around with stuff like that.

e.g. https://godbolt.org/z/zrecTcqMs uses prime numbers 3 and 7 for A and B, so even without changing the numbers, you can see which are multiples of which.

Except GCC is too clever for it to be that simple: it's multiplying using one or two LEA, e.g. RCX + RCX*2 = RCX*3, not using imul $3, %rcx, %rdx for example. If you use large non-simple numbers like 12345 for A and B, you'll see actual imul. https://godbolt.org/z/4G3qc5d5E.


I used gcc -fpie to make it use position-independent code: a RIP-relative LEA to get array addresses into registers, instead of addressing modes like array1(%rcx, %rdx, 2) which require the array address (in the .data or .bss section) to fit in a 32-bit sign-extended disp32 in the machine code.

I also used __attribute__((ms_abi)) to use the Windows x64 calling convention like your code does, since GCC on the Godbolt compiler explorer is targeting Linux. (MSVC is the only compiler on Godbolt that targets Windows by default, but it won't output in AT&T syntax.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • okay! what does this line do "leaq array1(%rip), %rax "? and where in the asm code is this part "array1[i1][j1].x" corresponding to? – Megan Darcy May 28 '21 at 06:44
  • 1
    @MeganDarcy: That puts the address of `array1` into RAX. [How to load address of function or label into register](https://stackoverflow.com/q/57212012). Like `mov $array1, %eax`, except it works in position-independent code (which is the gcc default on modern distros - that's why I used `-fpie` in the Godbolt links to match your asm. Without that, you would see GCC using addressing modes like `array1(%rcx, %rdx)`, taking advantage of the ability to use the symbol address as a 32-bit-sign-extended absolute address with other registers.) – Peter Cordes May 28 '21 at 07:11
  • Okay thanks! Am i correct to say that A is 3 and B is 12? Not very sure how to get the B value still – Megan Darcy May 28 '21 at 14:15
  • 1
    @MeganDarcy: It's not 12; that looks different from your asm, with more LEAs to multiply by 3, and then 4 as it adds to `j1`. BTW, I just updated the Godbolt links in my answer to use `__attribute__((ms_abi))` - sorry I didn't notice yesterday that your code was using the Windows calling convention, not x86-64 System V; Your first arg is in RCX, not RDI. (I was trying to write a general answer to this kind of problem, not do your homework for you). Anyway, you find the B value by finding the multiplier for `i2`, the var that arrives in R8D. – Peter Cordes May 28 '21 at 19:20