Why segmentation fault doesn't occur with smaller stack boundary?

Question

I'm trying to understand the difference of behavior between a code compiled with the GCC option -mpreferred-stack-boundary=2 and the default value which is -mpreferred-stack-boundary=4.

I already read a lot of Q/A about this option but I am not able to understand the case I'll described below.

Let's consider this code:

#include <stdio.h>
#include <string.h>

void dumb_function() {}

int main(int argc, char** argv) {
    dumb_function();

    char buffer[24];
    strcpy(buffer, argv[1]);

    return 0;
}

On my 64 bits architecture, I want to compile it for 32 bits so I'll use the -m32 option. So, I create two binaries, one with -mpreferred-stack-boundary=2, one with the default value:

sysctl -w kernel.randomize_va_space=0
gcc -m32 -g3 -fno-stack-protector -z execstack -o default vuln.c
gcc -mpreferred-stack-boundary=2 -m32 -g3 -fno-stack-protector -z execstack -o align_2 vuln.c

Now, if I execute them with an overflow of two bytes, I have segmentation fault for the default alignment but not in the other case:

$ ./default 1234567890123456789012345
Segmentation fault (core dumped)
$ ./align_2 1234567890123456789012345
$

I try to dig why this behavior with default. Here is the disassembly of the main function:

08048411 <main>:
 8048411:   8d 4c 24 04             lea    0x4(%esp),%ecx
 8048415:   83 e4 f0                and    $0xfffffff0,%esp
 8048418:   ff 71 fc                pushl  -0x4(%ecx)
 804841b:   55                      push   %ebp
 804841c:   89 e5                   mov    %esp,%ebp
 804841e:   53                      push   %ebx
 804841f:   51                      push   %ecx
 8048420:   83 ec 20                sub    $0x20,%esp
 8048423:   89 cb                   mov    %ecx,%ebx
 8048425:   e8 e1 ff ff ff          call   804840b <dumb_function>
 804842a:   8b 43 04                mov    0x4(%ebx),%eax
 804842d:   83 c0 04                add    $0x4,%eax
 8048430:   8b 00                   mov    (%eax),%eax
 8048432:   83 ec 08                sub    $0x8,%esp
 8048435:   50                      push   %eax
 8048436:   8d 45 e0                lea    -0x20(%ebp),%eax
 8048439:   50                      push   %eax
 804843a:   e8 a1 fe ff ff          call   80482e0 <strcpy@plt>
 804843f:   83 c4 10                add    $0x10,%esp
 8048442:   b8 00 00 00 00          mov    $0x0,%eax
 8048447:   8d 65 f8                lea    -0x8(%ebp),%esp
 804844a:   59                      pop    %ecx
 804844b:   5b                      pop    %ebx
 804844c:   5d                      pop    %ebp
 804844d:   8d 61 fc                lea    -0x4(%ecx),%esp
 8048450:   c3                      ret    
 8048451:   66 90                   xchg   %ax,%ax
 8048453:   66 90                   xchg   %ax,%ax
 8048455:   66 90                   xchg   %ax,%ax
 8048457:   66 90                   xchg   %ax,%ax
 8048459:   66 90                   xchg   %ax,%ax
 804845b:   66 90                   xchg   %ax,%ax
 804845d:   66 90                   xchg   %ax,%ax
 804845f:   90                      nop

Thanks to sub $0x20,%esp instruction, we can learn the compiler allocates 32 bytes for the stack which is coherent is the -mpreferred-stack-boundary=4 option: 32 is a multiple of 16.

First question: why, if I have a stack of 32 bytes (24 bytes for the buffer and the rest of junk), I get a segmentation fault with an overflow of just one byte?

Let's look what's happening with gdb:

$ gdb default
(gdb) b 10
Breakpoint 1 at 0x804842a: file vuln.c, line 10.

(gdb) b 12
Breakpoint 2 at 0x8048442: file vuln.c, line 12.

(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/default 1234567890123456789012345

Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10      strcpy(buffer, argv[1]);

(gdb) i f
Stack level 0, frame at 0xffffce00:
 eip = 0x804842a in main (vuln.c:10); saved eip = 0xf7e07647
 source language c.
 Arglist at 0xffffcde8, args: argc=2, argv=0xffffce94
 Locals at 0xffffcde8, Previous frame's sp is 0xffffce00
 Saved registers:
  ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffffcdfc

(gdb) x/6x buffer
0xffffcdc8: 0xf7e1da60  0x080484ab  0x00000002  0xffffce94
0xffffcdd8: 0xffffcea0  0x08048481

(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647

Just before the call to strcpy, we can see the saved eip is 0xf7e07647. We can find this information back from the buffer address (32 bytes for the stack stack + 4 bytes for the esp = 36 bytes).

Let's continue:

(gdb) c
Continuing.

Breakpoint 2, main (argc=0, argv=0x0) at vuln.c:12
12      return 0;

(gdb) i f
Stack level 0, frame at 0xffff0035:
 eip = 0x8048442 in main (vuln.c:12); saved eip = 0x0
 source language c.
 Arglist at 0xffffcde8, args: argc=0, argv=0x0
 Locals at 0xffffcde8, Previous frame's sp is 0xffff0035
 Saved registers:
  ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffff0031

(gdb) x/7x buffer
0xffffcdc8: 0x34333231  0x38373635  0x32313039  0x36353433
0xffffcdd8: 0x30393837  0x34333231  0xffff0035

(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647

We can see the overflow with the next bytes after the buffer: 0xffff0035. Also, where the eip where stored, nothing changed: 0xffffcdec: 0xf7e07647 because the overflow is of two bytes only. However, the saved eip given by info frame changed: saved eip = 0x0 and the segmentation fault occurs if I continue:

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()

What's happening? Why my saved eip changed while the overflow is of two bytes only?

Now, let's compare this with the binary compiled with another alignment:

$ objdump -d align_2
...
08048411 <main>:
...
 8048414:   83 ec 18                sub    $0x18,%esp
...

The stack is exactly 24 bytes. That means an overflow of 2 bytes will override the esp (but still not the eip). Let's check that with gdb:

(gdb) b 10
Breakpoint 1 at 0x804841c: file vuln.c, line 10.

(gdb) b 12
Breakpoint 2 at 0x8048431: file vuln.c, line 12.

(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/align_2 1234567890123456789012345

Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10      strcpy(buffer, argv[1]);

(gdb) i f
Stack level 0, frame at 0xffffce00:
 eip = 0x804841c in main (vuln.c:10); saved eip = 0xf7e07647
 source language c.
 Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
 Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
 Saved registers:
  ebp at 0xffffcdf8, eip at 0xffffcdfc

(gdb) x/6x buffer
0xffffcde0: 0xf7fa23dc  0x080481fc  0x08048449  0x00000000
0xffffcdf0: 0xf7fa2000  0xf7fa2000

(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647

(gdb) c
Continuing.

Breakpoint 2, main (argc=2, argv=0xffffce94) at vuln.c:12
12      return 0;

(gdb) i f
Stack level 0, frame at 0xffffce00:
 eip = 0x8048431 in main (vuln.c:12); saved eip = 0xf7e07647
 source language c.
 Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
 Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
 Saved registers:
  ebp at 0xffffcdf8, eip at 0xffffcdfc

(gdb) x/7x buffer
0xffffcde0: 0x34333231  0x38373635  0x32313039  0x36353433
0xffffcdf0: 0x30393837  0x34333231  0x00000035

(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647

(gdb) c
Continuing.
[Inferior 1 (process 6118) exited normally]

As expected, no segmentation fault here because I don't override the eip.

I don't understand this difference of behavior. In the two cases, the eip is not overriden. The only difference is the size of the stack. What's happening?

Additional information:

This behavior doesn't occur if the dumb_function is not present
I'm using the following version of GCC:

$ gcc -v
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

Some information about my system:

$ uname -a
Linux pierre-Inspiron-5567 4.15.0-107-generic #108~16.04.1-Ubuntu SMP Fri Jun 12 02:57:13 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Stack overflow and buffer out-of-bounds access results in *undefined behavior*. It doesn't have to end in a crash. — Some programmer dude, Jul 12 '20 at 09:20
Why not single-step the code (`display/i $pc` and `si`) and see exactly what causes the segfault? It seems like you're trying all the usual debugging tools except the one that could actually answer your question :-) — Nate Eldredge, Jul 12 '20 at 13:07
I added the [tag:x86] tag to help emphasize that this is a question about system-specific "under the hood" behavior and not a C language question per se. — Nate Eldredge, Jul 12 '20 at 13:09
Another good tip is to draw yourself a picture of everything that's on the stack, with arrows indicating where the stack pointer and any other relevant pointers are pointing. Step through the code one instruction at a time and update your picture as you go. Then you'll hopefully be able to see what it is that you're overwriting and can then figure out how it's used later. — Nate Eldredge, Jul 12 '20 at 13:16
I think I see what's happening. I can't take time to write a full explanation right now, and maybe you'd rather figure it out for yourself anyway, but a hint would be to pay attention to where `buffer` is on the stack, and what exactly is right above it. It's not quite as simple as "it's eip" or "it's esp". Where did that value come from, how does it get used, and what is the effect of corrupting it? — Nate Eldredge, Jul 12 '20 at 13:55
Just for the record, `-mpreferred-stack-boundary=2` changes the ABI (reducing the guaranteed stack alignment). Calling libc functions in a normal glibc (not compiled this way) with the stack not aligned by 16 can segfault. This will happen in practice on x86-64 with functions like `printf("%f", 1.23)` from a misaligned SSE store, or with any scanf in modern glibc, both of which depend on the 16-byte alignment guarantee: [glibc scanf Segmentation faults when called from a function that doesn't align RSP](https://stackoverflow.com/q/51070716). — Peter Cordes, Jul 12 '20 at 20:55
This isn't what's happening here; you're not crashing in libc, you're buffer overflowing and the option just changes stack layout. (Most 32-bit builds of glibc don't actually depend on the 16-byte alignment guarantee, AFAIK, so changing that part of the ABI with compiler options is probably safe for calling into glibc. But not for calling code compiled with `-msse2` where GCC might have used SSE2 to copy / init local structs or arrays.) — Peter Cordes, Jul 12 '20 at 20:56

Nate Eldredge · Accepted Answer · 2020-07-13T03:10:50.060

You're not overwriting the saved eip, it's true. But you are overwriting a pointer that the function is using to find the saved eip. You can actually see this in your i f output; look at "Previous frame's sp" and notice how the two low bytes are 00 35; ASCII 0x35 is 5 and 00 is the terminating null. So although the saved eip is perfectly intact, the machine is fetching its return address from somewhere else, thus the crash.

In more detail:

GCC apparently doesn't trust the startup code to align the stack to 16 bytes, so it takes matters into its own hands (and $0xfffffff0,%esp). But it needs to keep track of the previous stack pointer value, so that it can find its parameters and the return address when needed. This is the lea 0x4(%esp),%ecx, which loads ecx with the address of the dword just above the saved eip on the stack. gdb calls this address "Previous frame's sp", I guess because it was the value of the stack pointer immediately before the caller executed its call main instruction. I will call it P for short.

After aligning the stack, the compiler pushes -0x4(%ecx) which is the argv parameter from the stack, for easy access since it's going to need it later. Then it sets up its stack frame with push %ebp; mov %esp, %ebp. We can keep track of all addresses relative to %ebp from now on, in the way compilers usually do when not optimizing.

The push %ecx a couple lines down stores the address P on the stack at offset -0x8(%ebp). The sub $0x20, %esp makes 32 more bytes of space on the stack (ending at -0x28(%ebp)), but the question is, where in that space does buffer end up being placed? We see it happen after the call to dumb_function, with lea -0x20(%ebp), %eax; push %eax; this is the first argument to strcpy being pushed, which is buffer, so indeed buffer is at -0x20(%ebp), not at -0x28 as you might have guessed. So when you write 24 (=0x18) bytes there, you overwrite two bytes at -0x8(%ebp) which is our stored P pointer.

It's all downhill from here. The corrupted value of P (call it Px) is popped into ecx, and just before the return, we do lea -0x4(%ecx), %esp. Now %esp is garbage and points somewhere bad, so the following ret is sure to lead to trouble. Maybe Px points to unmapped memory and just attempting to fetch the return address from there causes the fault. Maybe it points to readable memory, but the address fetched from that location does not point to executable memory, so the control transfer faults. Maybe the latter does point to executable memory, but the instructions located there are not the ones we want to be executing.

If you take out the call to dumb_function(), the stack layout changes slightly. It's no longer necessary to push ebx around the call to dumb_function(), so the P pointer from ecx now winds up at -4(%ebp), there are 4 bytes of unused space (to maintain alignment), and then buffer is at -0x20(%ebp). So your two-byte overrun goes into space that's not used at all, hence no crash.

And here is the generated assembly with -mpreferred-stack-boundary=2. Now there is no need to re-align the stack, because the compiler does trust the startup code to align the stack to at least 4 bytes (it would be unthinkable for this not to be the case). The stack layout is simpler: push ebp, and subtract 24 more bytes for buffer. Thus your overrun overwrites two bytes of the saved ebp. This is eventually popped from the stack back into ebp, and so main returns to its caller with a value in ebp that is not the same as on entry. That's naughty, but it so happens that the system startup code doesn't use the value in ebp for anything (indeed in my tests it is set to 0 on entry to main, likely to mark the top of the stack for backtraces), and so nothing bad happens afterwards.

The over-complicated sequence GCC uses to align the stack pointer is simplified when possible in GCC8 and later ([Why is gcc generating an extra return address?](https://stackoverflow.com/q/38781118)). It's also probably unnecessary on modern Linux; modern versions of the i386 System V ABI guarantee that `_start` has a 16-byte aligned stack pointer, like the x86-64 SysV ABI always has. (Maybe GCC just always does that in 32-bit `main` because there are some non-Linux systems where extra alignment is an optional bonus, not mandatory, in 32-bit mode. e.g. Windows.) — Peter Cordes, Jul 13 '20 at 03:47

Why segmentation fault doesn't occur with smaller stack boundary?

1 Answers1