I'm trying to do a rather niche thing which is essentially breaking the CFI
(Call Frame Information in DWARF EH info) and rbp
& rsp
links between frames. Main reason for that is that is that past a certain point in thread control flow I want to do a call continuation which is basically a one-way tailcall combined with a yield which should clean up the stack and then return to the top of the stack ready to be executed again at the continuation point.
Here is the idea in principle, which works as long as I keep the lines that mess with the stack commented out:
/*
* x86_64 SysV:
* rdi, rsi, rds, rcx, r8, r9, xmm0-xmm7
*/
__asm {
mov rax, TCB
mov rax, qword ptr [rax] OSThreadControlBlock.StartFn;
call rax;
mov rax, 0;
// end of stack
//push rax;
//push rax;
//push rbx;
// last "real" frame
//push rbp;
//mov rbp, rsp;
//push rbx;
// make the call
mov rdi, RL;
lea rax, qword ptr __OS_RUNLOOP_START__;
call rax;
// trap if it returns
//int 3;
}
I'm aware of the general principles behind SP/BP registers, I'm specifically using -fno-omit-frame-pointer
. My question is, after having spent hours trying to get it to work, what am I missing? It seems that any alteration to the stack layout, even as simple as a push before a call while keeping it aligned will cause a snowball crash starting with something like this (custom signal handler):
Received fatal signal: Segmentation fault (11) [thread: 10298 ctl-thrd]
* Unknown error at address 0x0 Regs:
%rip=0x00000000003E2D91 %rbp=0x00007F820A547EA8 %rsp=0x00007F820A547DE8 %rax=0x00007F820A547DE8 %rbx=0x00007F820A547F38
%rdi=0x00000000002121E1 %rsi=0x000000000000007B %rcx=0x000000000000000A %r8=0x0000000000000900 %r9=0x00007F820A5490C0
The ABI in question is libc++
/libc++abi
on x86_64
Linux, with a LLVM/Clang 6.0.X based toolchain. I tried practically everything, I know the above looks weird but it's an MS extension for inline assembly, I checked multiple times in disassemblies that it generates perfectly sane code. As far as I understand this is some weird conflict between CFI and frame pointer based stuff but I'm not that amazingly good at x86_64
so I'm not really sure what I'm missing. I know the unwinding process is meant to be terminated by a sentinel (null SP/FP on the last frame) but at this point I'm honestly lost because even the debugger gets completely thrown off by this.
If anyone has any suggestions that would be really appreciated, I tried various things but the core problem is the same, as soon as I touch the stack, even if I return it to normal, everything goes haywire. Clobber beyond the asm block doesn't matter since the last call is not meant to conventionally return. One thing I did notice is that it seems this is somehow related to TLVs but I'm not sure how since NPTL is meant to configure that.
Any help or suggestions would me immensely appreciated.
Edit:
Looks like this comment from Valgrind may explain what is happening:
/* NB 9 Sept 07. There is a nasty kludge here in all these CALL_FN_
macros. In order not to trash the stack redzone, we need to drop
%rsp by 128 before the hidden call, and restore afterwards. The
nastyness is that it is only by luck that the stack still appears
to be unwindable during the hidden call - since then the behaviour
of any routine using this macro does not match what the CFI data
says. Sigh.
Why is this important? Imagine that a wrapper has a stack
allocated local, and passes to the hidden call, a pointer to it.
Because gcc does not know about the hidden call, it may allocate
that local in the redzone. Unfortunately the hidden call may then
trash it before it comes to use it. So we must step clear of the
redzone, for the duration of the hidden call, to make it safe.
Probably the same problem afflicts the other redzone-style ABIs too
(ppc64-linux, ppc32-aix5, ppc64-aix5); but for those, the stack is
self describing (none of this CFI nonsense) so at least messing
with the stack pointer doesn't give a danger of non-unwindable
stack. */