1

I am thinking about how to compile arrays in x86-64 assembly.

I am reading "Computer Systems - A Programmer's Perspective" and the authors give the following formula:

E[i] ->  `movl (%rdx, %rcx, 4i), %eax`

But I just used Compiler Explorer and GCC 12.2 to see the generated assembly for the following program:

int main() {
    int arr[] = {3,4,5};
    for (int i = 0; i < 3; i++) {
        arr[i];

    }
}

The generated code for this is:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-16], 3
        mov     DWORD PTR [rbp-12], 4
        mov     DWORD PTR [rbp-8], 5
        mov     DWORD PTR [rbp-4], 0
        jmp     .L2
.L3:
        add     DWORD PTR [rbp-4], 1
.L2:
        cmp     DWORD PTR [rbp-4], 2
        jle     .L3
        mov     eax, 0
        pop     rbp
        ret

Here is a quote from a stackoverflow answer that explains what the registers RBP and RSP are doing:

rbp is the frame pointer on x86_64. In your generated code, it gets a snapshot of the stack pointer (rsp) so that when adjustments are made to rsp (i.e. reserving space for local variables or pushing values on to the stack), local variables and function parameters are still accessible from a constant offset from rbp.

Is RBP 16 here? For example rbp - 16 = 0, rbp - 12 = 4, rbp - 8 = 8 which kind of satisfies the formula.

Anyway, could you please give me an idea of how the formula from the textbook is related to the generated code from GCC?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
chez93
  • 131
  • 6
  • `is rbp 16 here?` Its some memory address. As the quote states, its the frame's base address. – tkausl Mar 29 '23 at 05:03
  • @tkausi the quote: > The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes then explains the 16. in reality it should be 0 but it needs to be a multiple of 16? – chez93 Mar 29 '23 at 05:04
  • Yeah, but `rbp` still points to the base, no matter how many bytes are reserved. Only `esp` changes throughout the function, potentially. `rbp` stays the same. (usually) – tkausl Mar 29 '23 at 05:06
  • ok thanks. so rbp is a pointer to the beginning of the array right? – chez93 Mar 29 '23 at 05:09
  • 1
    No, rbp is a pointer to the top of the stack. – janneb Mar 29 '23 at 05:09
  • 2
    The first element of the array is at address [rbp-16], where it stores the value 3. – janneb Mar 29 '23 at 05:10
  • 2
    There is no corresponding asm to the C statement "a[i]" in the loop, as that value isn't used for anything it has been optimized away. – janneb Mar 29 '23 at 05:11
  • oh ok i nderstand. thanks – chez93 Mar 29 '23 at 05:14
  • be careful with assumptions as to what a compiler will generate, with experience with a specific compiler (and sometimes version)(and command line options) you can get good at it but, it is often bad to see a book that posts the generated code or anything that implies a compiler will choose this instruction or generate this output. as pointed out your array is local so it is stack based so it is going to use stack based accesses. – old_timer Mar 30 '23 at 00:18

1 Answers1

4

arr[i]; as a C statement compiles to zero instructions, even in a debug build, since arr isn't volatile. All the asm you're seeing is just locals at fixed locations.


If you're used to looking at AT&T syntax, use the output dropdown on Godbolt to uncheck "Intel Syntax".

Also, you could get useful asm to look at by writing a function that takes a pointer arg and sums it for example, so you can enable light optimization without having the array optimize away to just returning a constant. You could use volatile to force it not to if you really want to see asm that stores array initializers to the stack instead of just taking a function arg. You're writing this to look at the asm, not run it, so don't write a main(). (In general see How to remove "noise" from GCC/clang assembly output?)

int sumarr(int *arr){
    register int sum = 0; // register keyword is useful in GCC if you're going to disable optimization
    for (int i=0 ; i<1024 ; i++){
        sum += arr[i];
    }
    return sum;
}

Godbolt - gcc 12 with -Og uses an indexed addressing mode like you expect; most other optimization levels do a pointer increment.

# gcc -Og
sumarr:
        movl    $0, %eax
        movl    $0, %edx
        jmp     .L2                   # jump to the loop condition, redundant because 0 <= 1023 is known true; could just fall through into the loop, but -Og maybe intentionally preserves that execution?  It doesn't in general give consistent debugging.
.L3:                                  # do {
        movslq  %eax, %rcx             # sign-extend int to intptr_t since -Og doesn't optimize much
        addl    (%rdi,%rcx,4), %edx    # edx += *(rdi + rcx*4) = array[i]
        addl    $1, %eax
.L2:                                   # loop entry point for first iteration
        cmpl    $1023, %eax
        jle     .L3                   # }while(i<=1023)
        movl    %edx, %eax            # at low optimization levels, GCC didn't sum into the return-value register in the first place.
        ret

With a normal level of optimization like -O2, we get a pointer increment:

# GCC -O2 -fno-tree-vectorize        (-O2 by itself would vectorize with SIMD)
sumarr:
        leaq    4096(%rdi), %rdx    # endp = ptr + len
        xorl    %eax, %eax          # sum = 0
.L2:                                # do {
        addl    (%rdi), %eax
        addq    $4, %rdi
        cmpq    %rdx, %rdi
        jne     .L2                 # }while(p != endp)
        ret

The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. in reality it should be 0 but it needs to be a multiple of 16?

That's an over-simplification, but yes in this case 0*16 = 0 is a multiple of 16 which maintains stack alignment for RSP after push rbp. The actual space it uses is in the red zone, 128 bytes below RSP.

The amount of stack space reserved for locals is always a multiple of 8, either odd or even depending on how many pushes it did, so the total movement of RSP is 16*n + 8.

In this case where it doesn't need to preserve any call-preserved registers like RBX, it only pushed RBP (because -fno-omit-frame-pointer is the default at -O0.)

It doesn't sub rsp, 16 because the x86-64 SysV ABI includes a red-zone; the 128 bytes below RSP are safe from signal handlers or anything stepping on them asynchronously, so can be used without "reserving". Use -mno-red-zone to stop GCC doing that, if you want to see it reserve space for an array.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank you so much Peter. I was indeed wondering how to represent looping over an array in assembly. Thanks for your answer! – chez93 Mar 29 '23 at 22:50