Understanding how the stack in assembly works

Question

Having never before seen assembly in my life I'm having trouble understand the concept of Stack and how it works regarding the movement of registers.

Here's a code I've been given for adding 2 numbers (using intel syntax):

.intel_syntax noprefix

.data
    form1: .asciz "%d %d"       
    form2: .asciz "Sum: %d\n"

.text    
.globl main

sum:    
    enter 0,0    
    mov eax, edi
    add eax, esi    
    leave       
    ret

main:

    enter 8, 0
    
    push rcx
    
    lea rdi, form1      
    mov rcx, rbp        
    sub rcx, 4      
    mov rsi, rcx        
    sub rcx, 4      
    mov rdx, rcx        
    mov rax, 0      
    call scanf

    mov edi, [rbp-4]        
    mov esi, [rbp-8]    
    call sum

    lea rdi, form2    
    mov esi, eax        
    mov rax, 0    
    call printf

    mov rax, 0

    leave
    ret

What bugs me about this is while I do understand the basic commands the enter n, 0 function (being push rbp - mov rbp, rsp - sub rsp, n) is hard for me to get and also the [rbp-4]. I can't seem to get the gist of moving around this memory and where rbp and rsp registers are at all times so rbp-4 seems arbitrary even though it isn't

`enter 8, 0` set up a stack frame with 8 bytes of locals that start at `rbp-8`. Since the code has two locals of 32 bit size, the other starts at `rbp-4.` — Jester, Jan 13 '22 at 10:48
Assembly is a difficult concept. You cannot expect intuitively understanding it "having never seen assembly in your life". I would suggest reading a book, a course, or some other material. — cubuspl42, Jan 13 '22 at 10:52
"enter n, 0 function (...) is hard for me to get" Read the documentation of `enter` / `leave` instructions, it will be easier to get them when you know what they do. "and also the [rbp-4]" Read about x86 addressing modes. "where rbp and rsp registers are at all times?" They are inside the processor. Read about x86 registers. — cubuspl42, Jan 13 '22 at 10:58
I'm sorry if what I write feels harsh, but that's the best recommendation anyone could give you here. It's a difficult topic and it requires at least basic understanding of the architecture (here, x86) to get what the instructions do. A decent book will do it much better then strangers online. — cubuspl42, Jan 13 '22 at 11:01
As your question, in my opinion, could be summed up as "Please explain me basics of x86 architecture and its assembly language on example of my assignment", I'll vote to close it being too broad. — cubuspl42, Jan 13 '22 at 11:06

user123 · Answer 1 · 2022-01-14T18:57:16.887

Disclaimer

I'm not an expert and some things I write here might be wrong or might change based on several things. Feel free to comment for anything. Also, I'll write mostly for Linux because this is what I know about the most. It is really complex and I cannot think of everything because the question is broad so be complacent.

In general, the stack is allocated per process and given a portion of the virtual address space. Most often it will have a size of 8MB by default on most Linux distributions. For example, you could have the following:

Virtual address	Content
0xffff_8000_0000_0000 - 0xffff_ffff_ffff_ffff	Linux kernel (system calls, interrupts, drivers, etc)
0x0000_8000_0000_0000 - 0xffff_7fff_ffff_ffff	Unavailable
0x0000_7fff_ffff_ffff - 0x0	Process address space

The Linux kernel places its several portions in the upper half of the virtual address space (for more info read here: https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html). The upper half is also set as global in the page tables so that it won't be flushed from caches (including TLB) on CR3 reload. The unavailable section is due the canonical address requirement of x86-64 processors (for more info read: Why does QEMU return the wrong addresses when filling the higher half of the PML4?). The process address space is split again. I don't remember everything from head but you'll have something like (from the top downward):

Size	Content
8MB	Stack (growing from the top downwards)
Probably as big as RAM can be	Heap
Few GB	User mode binary (data and code segment)

With paging, every process sees a full virtual address space so it can have any virtual address in its code (most of which will produce a page/protection fault on access). Each CPU core got a CR3 register that points to the bottom of the first level of page tables (for more info about paging see: What is paging exactly? OSDEV).

The executable (you mostly start from the command line or double click) got virtual addresses specified within. It has got a starting virtual address (often 0x400000 by default). The operating-system simply builds the page tables based on what physical memory is available at the current moment so that the virtual addresses specified within the executable land somewhere in RAM where they won't bother another process.

The data segment is simply placed in RAM at the VAs specified in the executable (the code segment also). The code segment accesses the data segment mostly using RIP-relative addressing. It accesses the data segment by using an offset from RIP (the instruction pointer). Everything that lands in the data segment is either static class attributes (in C++) or global (in C and C++). The right offset is actually determined by the linker after compilation (ld is called by gcc). The symbol of the global/static variable is kept in the executable along with a position within the data segment. The linker then resolves that, to reach that variable, you will need a certain offset from RIP.

Meanwhile, the local function variables (those you specify within brackets) will be allocated on the stack. The stack is allocated by:

Pushing RBP.
Putting RSP in RBP.
Decrementing RSP of the space the function requires.

The pushing RBP part saves the context of the previous function that was being executed because the RBP register points to the top of the allocated space for this previous function. The putting RSP in RBP part also saves the context of the previous function because RSP will be saved before it is being decremented. Decrementing RSP will finally allocate the space that is required for the function to work. Then, everything will be a relative negative offset from RBP for accessing variables local to the function. For example, if sum used 8 bytes of stack you would have something like:

Before calling sum:

|<--RBP
|Previous function space
|<--RSP
|
|
|

After calling sum:

Push RBP.

|<--RBP
|Previous function space
|Return address
|RBP
|<--RSP
|

Put RSP in RBP (which saves the RSP pointer before you decrement it).

|
|Previous function space
|Return address
|RBP
|<--RSP <--RBP
|

Decrement RSP of space required for the function (8 bytes).

|
|Previous function space
|Return address
|RBP
|<--RBP
|8 bytes
|<--RSP

leave (put back RBP in RSP)

|
|Previous function space
|Return address
|RBP
|<--RSP <--RBP
|

leave (pop RBP which is basically popping RBP of the last function back in RBP)

|<--RBP
|Previous function space
|Return address
|<--RSP
|
|

ret (return the to address located on the stack and pop it)

|<--RBP
|Previous function space
|<--RSP
|
|
|

Now we have restored the context of the previous function.

It is definitely easier to see it the other way around when you have higher level code compiled to assembly. Once in assembly, it is harder to see the amount of stack space you require. In sum, since you only use registers, you don't have the need for the stack. In main, you use the stack and put stuff into it. So it is not like you look before writing the function and determine it will require 8 bytes. It requires 8 bytes simply because you decide to put stuff on the stack within the function. Here, you subtract 8 from the stack pointer because you put values (8 bytes) on the stack using scanf.

You put RBP in RCX. Then you decrement it and put the content of RCX in the second argument of the scanf function. RBP is a pointer so it contains an address (the address of the top of the stack portion of main). The number you input will thus be put at this address (on the stack). You do the same with the third argument (the first argument being the format string). Since you use the stack, you must save its context by doing all the above if you actually plan on using the stack in the other functions you call (that is you can optimize not doing it if you won't unless I'm wrong because I know close to nothing about assembly optimization).

The sum function doesn't put anything on the stack and doesn't actually access it so there is no real need to do enter 0, 0/leave. Basically, you push RBP, put RSP in RBP, and decrement RSP of 0. You don't do anything with the stack then you do the reverse operation with leave.

The reason the stack is used in main is because you can't just call scanf with a local variable in assembly. You need an address where to put the input. Otherwise, you must use a variable (global) in the data segment. I think the code uses the stack to demonstrate how local function variables work.

Understanding how the stack in assembly works

1 Answers1