C main
is called (indirectly) from CRT startup code, not directly from the kernel.
After main
returns, that code calls atexit
functions to do stuff like flushing stdio buffers, then passes main's return value to a raw _exit
system call. Or exit_group
which exits all threads.
You make several wrong assumptions, all I think based on a misunderstanding of how kernels work.
The kernel runs at a different privilege level from user-space (ring 0 vs. ring 3 on x86). Even if user-space knew the right address to jump to, it can't jump into kernel code. (And even if it could, it wouldn't be running with kernel privilege level).
ret
isn't magic, it's basically just pop %rip
and doesn't let you jump anywhere you couldn't jump to with other instructions. Also doesn't change privilege level1.
Kernel addresses aren't mapped / accessible when user-space code is running; those page-table entries are marked as supervisor-only. (Or they're not mapped at all in kernels that mitigate the Meltdown vulnerability, so entering the kernel goes through a "wrapper" block of code that changes CR3.)
Virtual memory is how the kernel protects itself from user-space. User-space can't modify page tables directly, only by asking the kernel to do it via mmap
and mprotect
system calls. (And user-space can't execute privileged instructions like mov cr3, rax
to install new page tables. That's the purpose of having ring 0 (kernel mode) vs. ring 3 (user mode).)
The kernel stack is separate from the user-space stack for a process. (In the kernel, there's also a small kernel stack for each task (aka thread) that's used during system calls / interrupts while that user-space thread is running. At least that's how Linux does it, IDK about others.)
The kernel doesn't literally call
user-space code; The user-space stack doesn't hold any return address back into the kernel. A kernel->user transition involves swapping stack pointers, as well as changing privilege levels. e.g. with an instruction like iret
(interrupt-return).
Plus, leaving a kernel code address anywhere user-space can see it would defeat kernel ASLR.
Footnote 1: (The compiler-generated ret
will always be a normal near ret
, not a retf
that could return through a call gate or something to a privileged cs
value. x86 handles privilege levels via the low 2 bits of CS but nevermind that. MacOS / Linux don't set up call gates that user-space can use to call into the kernel; that's done with syscall
or int 0x80
instructions.)
In a fresh process (after an execve
system call replaced the previous process with this PID with a new one), execution begins at the process entry point (usually labeled _start
), not at the C main
function directly.
C implementations come with CRT (C RunTime) startup code that has (among other things) a hand-written asm implementation of _start
which (indirectly) calls main
, passing args to main according to the calling convention.
_start
itself is not a function. On process entry, RSP points at argc
, and above that on the user-space stack is argv[0]
, argv[1]
, etc. (i.e. the char *argv[]
array is right there by value, and above that the envp
array.) _start
loads argc
into a register and puts pointers to the argv and envp into registers. (The x86-64 System V ABI that MacOS and Linux both use documents all this, including the process-startup environment and the calling convention.)
If you try to ret
from _start
, you're just going to pop argc
into RIP, and then code-fetch from absolute address 1
or 2
(or other small number) will segfault. For example, Nasm segmentation fault on RET in _start shows an attempt to ret
from the process entry point (linked without CRT startup code). It has a hand-written _start
that just falls through into main
.
When you run gcc main.c
, the gcc
front-end runs multiple other programs (use gcc -v
to show details). This is how the CRT startup code gets linked into your process:
- gcc preprocesses (CPP) and compiles+assembles
main.c
to main.o
(or a temporary file). On MacOS, the gcc
command is actually clang which has a built-in assembler, but real gcc
really does compile to asm and then run as
on that. (The C preprocessor is built-in to the compiler, though.)
- gcc runs something like
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie /usr/lib/Scrt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtbeginS.o main.o -lc -lgcc /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtendS.o
. That's actually simplified a lot, with some of the CRT files left out, and paths canonicalized to remove ../../lib
parts. Also, it doesn't run ld
directly, it runs collect2
which is a wrapper for ld
. But anyway, that statically links in those .o
CRT files that contain _start
and some other stuff, and dynamically links libc (-lc
) and libgcc (for GCC helper functions like implementing __int128
multiply and divide with 64-bit registers, in case your program uses those).
.intel_syntax
.text:
.global _rbp
_rbp:
mov rax, rbp
ret;
This is not allowed, ...
The only reason that doesn't assemble is because you tried to declare .text:
as a label, instead of using the .text
directive. If you remove the trailing :
it does assemble with clang (which treats .intel_syntax
the same as .intel_syntax noprefix
).
For GCC / GAS to assemble it, you'd also need the noprefix
to tell it that register names aren't prefixed by %
. (Yes you can have Intel op dst, src order but still with %rsp
register names. No you shouldn't do this!) And of course GNU/Linux doesn't use leading underscores.
Not that it would always do what you want if you called it, though! If you compiled main
without optimization (so -fno-omit-frame-pointer
was in effect), then yes you'd get a pointer to the stack slot below the return address.
And you definitely use the value incorrectly. (*p)-4;
loads the saved RBP value (*p
) and then offsets by four 8-byte void-pointers. (Because that's how C pointer math works; *p
has type void*
because p
has type void **
).
I think you're trying to get your own return address and re-run the call
instruction (in main's caller) that reached main, eventually leading to a stack overflow from pushing more return addresses. In GNU C, use void * __builtin_return_address (0)
to get your own return address.
x86 call rel32
instructions are 5 bytes, but the call
that called main was probably an indirect call, using a pointer in a register. So it might be a 2-byte call *%rax
or a 3-byte call *%r12
, you don't know unless you disassemble your caller. (I'd suggest single-stepping by instructions (GDB / LLDB stepi
) off the end of main
using a debugger in disassembly mode. If it has any symbol info for main's caller, you'll be able to scroll backward and see what the previous instruction was.
If not, you might have to try and see what looks sane; x86 machine code can't be unambiguously decoded backwards because it's variable-length. You can't tell the difference between a byte within an instruction (like an immediate or ModRM) vs. the start of an instruction. It all depends on where you start disassembling from. If you try a few byte offsets, usually only one will produce anything that looks sane.
asm("movq %rax, 0"); //Exit code is 11, so now it should be 0
This is a store of RAX to absolute address 0
, in AT&T syntax. This of course segfaults. exit code 11 is from SIGSEGV, which is signal 11. (Use kill -l
to see signal numbers).
Perhaps you wanted mov $0, %eax
. Although that's still pointless here, you're about to call through your function pointer. In debug mode, the compiler might load it into RAX and step on your value.
Also, writing a register in an asm
statement is never safe when you don't tell the compiler which registers you're modifying (using constraints).
printf("Main: %p\n", main);
printf("&Main: %p\n", &main); //WTF
main
and &main
are the same thing because main
is a function. That's just how C syntax works for function names. main
isn't an object that can have its address taken. & operator optional in function pointer assignment
It's similar for arrays: the bare name of an array can be assigned to a pointer or passed to functions as a pointer arg. But &array
is also the same pointer, same as &array[0]
. This is true only for arrays like int array[10]
, not for pointers like int *ptr
; in the latter case the pointer object itself has storage space and can have its own address taken.