Memory allocation and addressing in Assembly

Question

I am trying to learn assembly and there a couple of instructions whose purpose I do not fully understand.

C code

#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("Argument One - %s\n", argv[1]);
    return 0;
}

Assembly

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 14
    .intel_syntax noprefix
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
## %bb.0:
    push    rbp
    mov rbp, rsp
    sub rsp, 32
    lea rax, [rip + L_.str]
    mov dword ptr [rbp - 4], 0
    mov dword ptr [rbp - 8], edi
    mov qword ptr [rbp - 16], rsi
    mov rsi, qword ptr [rbp - 16]
    mov rsi, qword ptr [rsi + 8]
    mov rdi, rax
    mov al, 0
    call    _printf
    xor ecx, ecx
    mov dword ptr [rbp - 20], eax ## 4-byte Spill
    mov eax, ecx
    add rsp, 32
    pop rbp
    ret
                                        ## -- End function
    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "Argument One - %s\n"


.subsections_via_symbols

Q1. sub rsp, 32

Why is space allocated for 32 bytes when there are no local variables? I believe argc and argv are saved in the registers edi and rsi respectively. If its so that they can be moved onto the stack, wouldn't that require only 12 bytes?

Q2. lea rax, [rip + L_.str] and mov rdi, rax

Am I correct in understanding that L_.str has the address of the string ""Argument One - %s\n"? From what I've understood, printf gets access to this string through the register rdi. So, why doesn't the instruction mov rdi, L_.str work instead?

Q3. mov dword ptr [rbp - 4], 0

Why is zero being pushed onto the stack?

Q4. mov dword ptr [rbp - 8], edi and mov qword ptr [rbp - 16], rsi

I believe these instruction are to get argc and argv onto the stack. Is it pure convention to use edi and rsi?

Q5. mov dword ptr [rbp - 20], eax

I haven't a clue what this does.

Most of that is noise and overhead from unoptimized code, e.g. copying args from registers to the stack for no reason, and (Q5) spilling the unused printf return value to stack space. Compile with `-O3` or `-O2` to get just the interesting part. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) — Peter Cordes, Jan 17 '19 at 06:03
And yes, there is a standard that specifies how args are passed to functions, so compilers can make code that can call code from other compilers. In your case it's the x86-64 System V ABI. See the function-calling part of [What are the calling conventions for UNIX & Linux system calls on i386 and x86-64](https://stackoverflow.com/q/2535989), and [What registers are preserved through a linux x86-64 function call](https://stackoverflow.com/q/18024672). See also https://stackoverflow.com/tags/x86/info for more links to docs. — Peter Cordes, Jan 17 '19 at 06:17
You are compiling without optimisations. This causes the compiler to generate a lot of useless instructions. Pass at least `-O1`, better `-O2` so the compiler generates reasonable code. — fuz, Jan 17 '19 at 11:03
@fuz Why would a compiler ever generate useless instructions to begin with? I really don't understand that. Just to adhere to calling conventions? — puppydrum64, Jul 01 '22 at 18:06
@puppydrum64 How is the compiler supposed to know if an instruction is useless to begin with, if you tell it not to check if instructions are useless? These seemingly useless instructions may be needed in some cases and if you tell the compiler not to check if they really are, it'll just generate them anyway. — fuz, Jul 01 '22 at 18:45

score 3 · Answer 1 · edited Jan 17 '19 at 06:43

3

Q1. sub rsp, 32

This is allocating space that is used to store some data. Although it allocates 32 bytes, the code is only using the first 16 bytes of that allocated space, a qword at [rbp-8] (0:edi) and a qword at [rbp-16] (rdi).

Q2. lea rax, [rip + L_.str] and mov rdi, rax

The lea is getting the address of a string stored in the "code" segement. It's moved to rdi which is used as one of the parameters for printf.

Q3. mov dword ptr [rbp - 4], 0 ... mov dword ptr [rbp - 8], edi

This stores a 64-bit little endian value composed of 0:edi at [rbp - 8]. I'm not sure why it's doing this, since it never loads from that qword later on.

It's normal for un-optimized code to store their register arguments to memory, where debug info can tell debuggers where to look for and modify them, but it's unclear why clang zero-extends argc in edi to 64 bits.

More likely that 0 dword is something separate, because it if the compiler really wanted to store a zero-extend argc, compilers will zero-extend in registers with a 32-bit mov, like mov ecx, edi ; mov [rbp-8], rcx. Possibly this extra zero is a return-value temporary which it later decides not to use because of an explicit return 0; instead of the implicit one from falling off the end of main? (main is special, and I think clang does create an internal temporary variable for the return value.)

Q4 mov qword ptr [rbp - 16], rsi ... mov rsi, qword ptr [rbp - 16]

Optimization off? It stores rsi then loads rsi from [rbp - 16]. rsi holds your argv function arg ( == &argv[0]). The x86-64 System V ABI passes integer/pointer args in RDI, RSI, RDX, RCX, R8, R9, then on the stack.

... mov rsi, qword ptr [rsi + 8]

This is loading rsi with the contents of argv[1], as the 2nd arg for printf. (For the same reason that main's 2nd arg was in rsi).

The x86-64 System V calling convention is also the reason for zeroing AL before calling a varargs function with no FP args.

Q5. mov dword ptr [rbp - 20], eax

Optimization off? It's storing the return value from printf, but never using it.

edited Jan 17 '19 at 06:43

Peter Cordes

328,167
45
605
847

answered Jan 17 '19 at 05:30

rcgldr

27,407
3
36
61

This is MacOS, not Windows x86-64 ABI. No shadow space in the 64-bit ABI for Linux or BSD. – Michael Petch Jan 17 '19 at 05:59
I should point out that i assumed MacOS given this line in their output `.build_version macos, 10, 14` – Michael Petch Jan 17 '19 at 06:03
Yes, optimization has been turned off. Also, why not use **mov rdi, L_.str** instead to move the address of the string into rdi? – D Kar Jan 17 '19 at 06:04
@DKar : because `lea rax, [rip + L_.str]` makes the code position independent. – Michael Petch Jan 17 '19 at 06:05
@MichaelPetch I'm sorry but I am new to ASM. Could to elaborate what you mean by position independent? – D Kar Jan 17 '19 at 06:07
@MichaelPetch - I updated my answer, It's not shadow space allocated by the caller, but instead space allocated by the callee. In this case only 16 of the 32 allocated bytes are used. – rcgldr Jan 17 '19 at 06:08
@rcgldr Why 32 bytes though? argc is 4 bytes and argv is 8 bytes. That's a total of 12 bytes. – D Kar Jan 17 '19 at 06:13
@DKar - I'm not sure why. I updated my answer to note that only 16 of the 32 bytes are used, the qword at [rbp-8] is never loaded, and the qword at [rbp-16] is stored from and immediately loaded back into rdi, useless, but probably due to optimization off. – rcgldr Jan 17 '19 at 06:17
@D Kar Essentially "position independent" means that it doesn't matter where in memory the code is run. For an example let's talk about CD-ROM games for your computer (remember those?) Your computer would copy the game program to temporary RAM and run it from there. The catch is that since the operating system is managing memory, the writers of the CD software had no clue where this temporary RAM actually is. So in order to make sure the game always worked, the data offsets would have to be relative to the `%rip` register, so that they'd always be correct. – puppydrum64 Nov 23 '22 at 17:26

Memory allocation and addressing in Assembly

1 Answers1

Linked