Linux minimal runnable examples with disassembly analysis
Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.
In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.
All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.
Local variable inside a function
Be it main
or any other function:
void f(void) {
int my_local_var;
}
As shown at: What does <value optimized out> mean in gdb?
-O0
: stack
-O3
: registers if they don't spill, stack otherwise
For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?
Global variables and static
function variables
/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;
/* DATA */
int my_global_implicit_explicit_1 = 1;
void f(void) {
/* BSS */
static int my_static_local_var_implicit;
static int my_static_local_var_explicit_0 = 0;
/* DATA */
static int my_static_local_var_explicit_1 = 1;
}
- if initialized to
0
or not initialized (and therefore implicitly initialized to 0
): .bss
section, see also: Why is the .bss segment required?
- otherwise:
.data
section
char *
and char c[]
As shown at: Where are static variables stored in C and C++?
void f(void) {
/* RODATA / TEXT */
char *a = "abc";
/* Stack. */
char b[] = "abc";
char c[] = {'a', 'b', 'c', '\0'};
}
TODO will very large string literals also be put on the stack? Or .data
? Or does compilation fail?
Function arguments
void f(int i, int j);
Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.
Then as shown at What does <value optimized out> mean in gdb?, -O0
then slurps everything into the stack, while -O3
tries to use registers as much as possible.
If the function gets inlined however, they are treated just like regular locals.
const
I believe that it makes no difference because you can typecast it away.
Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata
even if not const.
TODO analysis.
Pointers
They are variables (that contain addresses, which are numbers), so same as all the rest :-)
malloc
The question does not make much sense for malloc
, since malloc
is a function, and in:
int *i = malloc(sizeof(int));
*i
is a variable that contains an address, so it falls on the above case.
As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?
Note however that this is basically exactly what the exec
syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec
has some extra limitations on where to load to (e.g. is the code is not relocatable).
The exact syscall used for malloc
is mmap
in modern 2020 implementations, and in the past brk
was used: Does malloc() use brk() or mmap()?
Dynamic libraries
Basically get mmap
ed to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710
envinroment variables and main
's argv
Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?