Segfault pushing to stack in C inline assembly

Question

I am having an issue with some inline assembly. I am writing a compiler, and it is compiling to assembly, and for portability i made it add the main function in C and just use inline assembly. Though even the simplest inline assembly is giving me a segfault. Thanks for your help

int main(int argc, char** argv) {
  __asm__(
"push $1\n"
  );
  return 0;
}

1. For assembly language questions we need to know the CPU architecture you're using. — zwol, Dec 13 '21 at 02:11
2. Inline assembly MUST NOT[rfc2119] modify the stack pointer. This is true for all CPU architectures, and for all C compilers that use the inline-assembly syntax you're using. — zwol, Dec 13 '21 at 02:12
The simplest inline assembly would be a `nop` (or an empty one). — Jester, Dec 13 '21 at 02:14
While it's easy to think of 'main' as being the top of a c program, there's usually some code above it (to set up argc & argv for example). Which means that it needs to be able to return to the caller. But your code is adjusting the stack, and putting $1 where the caller's address would be. So when it tries to return, it's going to a very bad place. — David Wohlferd, Dec 13 '21 at 02:33
@DavidWohlferd If you would like to post this as an answer it would be cool. I didnt really realize that. Thank you :) — ANTHONY STERLING-PALMARI, Dec 13 '21 at 02:45
Glad you found that useful. msimonelli seems to say much the same thing in his answer, perhaps you could accept his. — David Wohlferd, Dec 13 '21 at 02:56
Instead of emitting inline assembly, why don't you generate assembly in an assembly source file? — fuz, Dec 13 '21 at 10:12

score 0 · Accepted Answer · answered Dec 13 '21 at 02:45

TLDR at bottom. Note: everything here is assuming x86_64.

The issue here is that compilers will effectively never use push or pop in a function body (except for prologues/epilogues).

Consider this example.

When the function begins, room is made on the stack in the prologue with:

push rbp
mov rbp, rsp
sub rsp, 32

This creates 32 bytes of room for main. Then notice how throughout the function, instead of pushing items to the stack, they are mov'd to the stack through offsets from rbp:

        mov     DWORD PTR [rbp-20], edi
        mov     QWORD PTR [rbp-32], rsi
        mov     DWORD PTR [rbp-4], 2
        mov     DWORD PTR [rbp-8], 5

The reason for this is it allows for variables to be stored anywhere at anytime, and loaded from anywhere at anytime without requiring a huge amount of push/pops.

Consider the case where variables are stored using push and pop. Say a variable is stored early on in the function, let's call this foo. 8 variables on the stack later, you need foo, how should you access it?

Well, you can pop everything until foo, and then push everything back, but that's costly.

It also doesn't work when you have conditional statements. Say a variable is only ever stored if foo is some certain value. Now you have a conditional where the stack pointer could be at one of two locations after it!

For this reason, compilers always prefer to use rbp - N to store variables, as at any point in the function, the variable will still live at rbp - N.

NB: On different ABIs (such as i386 system V), parameters to arguments may be passed on the stack, but this isn't too much of an issue, as ABIs will generally specify how this should be handled. Again, using i386 system V as an example, the calling convention for a function will go something like:

push edi ; 2nd argument to the function.
push eax ; 1st argument to the function.
call my_func
; here, it can be assumed that the stack has been corrected

So, why does push actually cause an issue? Well, I'll add a small asm snippet to the code

At the end of the function, we now have the following:

        push 64

        mov     eax, 0
        leave
        ret

There's 2 things that fail now due to pushing to the stack.

The first is the leave instruction (see this thread)

The leave instruction will attempt to pop the value of rbp that was stored at the beginning of the function (notice the only push that the compiler generates is at the start: push rbp).

This is so that the stack frame of the caller is preserved following main. By pushing to the stack, in our case rbp is now going to be set to 64, since the last value pushed is 64. When the callee of main resumes it's execution, and tries to access a value at say, rbp - 8, a crash will occur, as rbp - 8 is 0x38 in hex, which is an invalid address.

But that assumes the callee even get's execution back!

After rbp has it's value restored with the invalid value, the next thing on the stack will be the original value of rbp.

The ret instruction will pop a value from the stack, and return to that address...

Notice how this might be slightly problematic?

The CPU is going to try and jump to the value of rbp stored at the start of the function!

On nearly every modern program, the stack is a "no execute" zone (see here), and attempting to execute code from there will immediately cause a crash.

So, TLDR: Pushing to the stack violates assumptions made by the compiler, most importantly about the return address of the function. This violation causes program execution to end up on the stack (generally), which will cause a crash

Ah ok, Thank you. The thing about my compiler is that it is a "copy" of forth called corth. It will be like forth with more features. Thats why it is using the stack. Just for ease. — ANTHONY STERLING-PALMARI, Dec 13 '21 at 03:01
@ANTHONYSTERLING-PALMARI: Compiling a stack-based language into x86 code that uses the stack the same way is pretty garbage for performance, but can be done as baby steps for a toy compiler. Regardless, I don't see how *executing* push/pop instructions *in the compiler* is going to help anything. Is it actually an interpreter? (I assume the code you showed is supposed to be part of your compiler, not the program your compiler compiles.) — Peter Cordes, Dec 13 '21 at 03:08
If you want to use the asm stack as a stack data-structure, you can't also mix in call/return because return addresses and local vars will be mixed with your data. There's nothing easy about doing this in C, or even possible. It's something you could do if your compiler (or interpreter?) was written in asm, and would also make the problems obvious, because then `push` / `ret` would be right there in your own code. — Peter Cordes, Dec 13 '21 at 03:08
@msimonelli: GCC will only use `leave` if it moved RSP on function entry, other than push. When compiling for x86-64 SysV, it can use the red-zone below RSP for locals (including spilling the register args in a -O0 debug build). That's why this inline asm actually breaks things: if it had used `leave`, that would undo the push. https://godbolt.org/z/61vsoqf5M shows that it happens to not crash (despite still being super broken) if you build with `-O0 -mno-red-zone`, since this used `main(int, char**)` instead of `main(void)`. The latter will still crash. — Peter Cordes, Dec 13 '21 at 03:17
Speaking of the red-zone, it's not safe to do *balanced* push/pop inside one `asm()` statement, because there's no way to tell the compiler that you're going to overwrite that space. You'd have to move RSP down by 128 bytes on entry, then move it back, unless you compile this function / file with `-mno-red-zone`. [Inline assembly that clobbers the red zone](https://stackoverflow.com/a/47402504) @ANTHONYSTERLING-PALMARI. (Although I think the whole idea of doing push/pop inside inline asm to interpret Forth is doomed, even with `-mno-red-zone`.) — Peter Cordes, Dec 13 '21 at 03:19
@PeterCordes Thank you for this information. May i ask, what would be a good way to compile "forth" and not using the stack in ASM? — ANTHONY STERLING-PALMARI, Dec 13 '21 at 03:22
@ANTHONYSTERLING-PALMARI: like a JVM JIT does. Java bytecode is also virtual stack-machine, but compilers turn that stack logic into the same kind of internal representation of the program logic that C compilers build directly from expressions like `x = a + b * c - 2`, and do register allocation for local vars, ideally producing normal-looking asm like you'd expect for an efficient implementation. Obviously writing a good *optimizing* compiler is extremely hard, a huge amount of work, at least to compete with modern LLVM and GCC. Usually people write a front-end to feed LLVM's optimizer. — Peter Cordes, Dec 13 '21 at 03:30
A good optimizing compiler usually involves transforming the program logic into an [SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) form, optimizing in that domain, and then transforming again to a more target-specific representation as you generate asm that runs as-if the program logic was executing on the abstract machine. I'm just explaining this because you asked, not because you shouldn't play around with a toy compiler. (See also [Why are there so few C compilers?](https://softwareengineering.stackexchange.com/q/273698) re: optimizing is the hard part.) — Peter Cordes, Dec 13 '21 at 03:32
(But make sure you understand the difference between a compiler and interpreter! Executing inline asm in the compiler doesn't put those insns in a file or in memory where they can be executed later, once you're done compiling.) — Peter Cordes, Dec 13 '21 at 03:33
Ah ok I understand. Thank you for the help @PeterCordes. Really appreciate it. — ANTHONY STERLING-PALMARI, Dec 13 '21 at 16:11

Segfault pushing to stack in C inline assembly

1 Answers1