The crash is caused by stack misalignment. See Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
The jump-by-ret results in entering runme
with the stack misaligned, which violates the ABI, and some libc functions do in fact break when called with a misaligned stack. It doesn't happen on my system, but apparently your malloc
implementation (which printf
calls) requires stack alignment.
Disassembling the code bytes, the faulting instruction is movaps [rsp+0x10], xmm1
, whose memory operand must be aligned to 16 bytes. However, rsp
has a hex value ending in 8
, so rsp+0x10
is not aligned.
I don't off the top of my head see a simple way to have the exploit work around this.
Here is a brief explanation of the principle of stack alignment and how it leads to the crash.
It simply means that when the movaps
instruction is executed, the value in rsp
is not a multiple of 16 (which is mathematically equivalent to saying that its last hex digit is not 0). The compiler is careful to ensure that it generates code that always adjusts the stack pointer by multiples of 16, such that if it was properly aligned by the caller of this function, then the calls made by this function will also occur with proper alignment.
The rule set out by the x86-64 SysV ABI, which Linux compilers conform to, is that rsp
must be a multiple of 16 (i.e. must end in 0) when a call instruction is issued. This means that when the called function begins to execute, then rsp
is 8 less than a multiple of 16 (i.e. ends in 8), because of the 8-byte return address that was pushed by call. So when main
reaches its ret
instruction, with your modified return address on the stack, rsp
likewise ends in 8 (all stack modification done within main
has been undone at this point). The ret
pops the stack once, so you end up at runme
with rsp
ending in 0, which is wrong.
This "parity error" propagates down through printf
and into malloc
. The _int_malloc
function expects to be entered with rsp
ending in 8, so it presumably subtracts an additional 8 bytes (possibly just by pushing) somewhere before executing movaps
. As such, rsp
would end in 0 at that point and all would be well. But since the situation was reversed on entry to runme
, it stays reversed. _int_malloc
got entered with rsp
ending in 0 instead, and so its subtraction of 8 bytes left it not ending in 0 when movaps
executed.
To your comment: At the level of C, stack alignment is the job of the compiler, not the programmer. So a C program can freely define a local array of size 17, and the compiler will then have to know to actually adjust the stack pointer by 32 bytes, leaving the other 15 bytes unused (or using them for other local variables). It isn't something that a C programmer normally has to worry about, but it becomes relevant when you are hacking internals like this.