-2

I used https://godbolt.org/ with "x86-64 gcc 9.1" to assemble the following C code to understand why passing a pointer to a local variable as a function argument works. Now I have difficulties to understand some steps.

I commented on the lines I have difficulties with.

void printStr(char* cpStr) {
    printf("str: %s", cpStr);
}


int main(void) {
    char str[] = "abc";
    printStr(str);
    return 0;
}
.LC0:
        .string "str: %s"
printStr:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16                     ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
        mov     QWORD PTR [rbp-8], rdi      ; why copy rdi to the stack...
        mov     rax, QWORD PTR [rbp-8]      ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
        mov     rsi, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16                     ; why allocate 16 bytes when "abc" is just 4 bytes long?
        mov     DWORD PTR [rbp-4], 6513249
        lea     rax, [rbp-4]                ; pointer to str[0] copied into rax (*)
        mov     rdi, rax                    ; why copy the pointer to str[0] to rdi?
        call    printStr
        mov     eax, 0
        leave
        ret
Fabian
  • 1
  • 2
  • 6
    16 bytes is for alignment. You are looking at unoptimized code, don't be surprised to see nonsense. Add `-O3` to the compiler options. `rdi` is used to pass the first argument as per [standard calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI). – Jester Aug 22 '19 at 18:30
  • @Jester Thank you. That's the answer for me. – Fabian Aug 22 '19 at 22:12
  • 1
    Generally speaking, unoptimized code is compiled in such a way that it behaves nicely when stepping through in a debugger – M.M Aug 22 '19 at 22:33
  • Possibly a duplicate of [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](//stackoverflow.com/q/53366394) which explains why `-O0` does what it does. – Peter Cordes Aug 23 '19 at 01:55

1 Answers1

0

Thanks to the help of Jester I could solve my confusion. The following code is compiled with the "-O1" flag of GCC (for me the best optimization level to understand what's going on):

.LC0:
    .string "str: %s"
printStr:
    sub     rsp, 8
                                            ; now the call to printf gets prepared, rdi = first argument, rsi = second argument
    mov     rsi, rdi                        ; move str[0] to rsi
    mov     edi, OFFSET FLAT:.LC0           ; move address of static string literal "str: %s" to edi
    mov     eax, 0                          ; set eax to the number of vector registers used, because printf is a varargs function
    call    printf
    add     rsp, 8
    ret
main:
    sub     rsp, 24
    mov     DWORD PTR [rsp+12], 6513249     ; create string "abc" on the stack
    lea     rdi, [rsp+12]                   ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
    call    printStr
    mov     eax, 0
    add     rsp, 24
    ret

As Jester said, the 16 bytes were allocated for alignment. There is a good post on Stack Overflow which explains this here.

Edit:

There is a post on Stack Overflow which explains why al is zeroed before a call to a varargs function here.

Fabian
  • 1
  • 2
  • 1
    "set eax to 0 because printStr is of type void" - wrong. `eax` (actually `al`) contains the number of vector registers used to pass arguments to varargs functions, in this case `printf`. It has nothing to do with the caller `printStr` nor with the return type being `void`. – Jester Aug 22 '19 at 22:37
  • @Jester Thank you again - going to edit my answer – Fabian Aug 22 '19 at 22:39
  • PS: If you want to see that in action, print a floating point value. – Jester Aug 22 '19 at 22:48