I have a small programming language I've been designing, and I want a relatively easy backend option for it. C stands out as a relatively decent option for all the usual reasons (such as "I can't follow good advice" and "I don't know assembly"), plus more importantly the fact that there's not very much in my language that can't easily be expressed in C (it's more about the type checker, no GC or heavy-duty runtime to accommodate) - except one thing: optimised tail calls.
It's not a deal breaker: I don't really need it, as the language has looping constructs, but it's supposed to be functional too, so TCO seems like a valuable thing to put in if possible. Trampoline or lazy-thunk solutions would work, but appear (from the literature, I have not profiled yet, and anyway don't have any usage information) to have a noticeable effect on performance. Don't want the implementation to discourage users from expressing a problem in a valid way.
This question leaves me thinking it ought to be possible to create a relatively simple platform-specific "tailcall" block, to replace "return" and force the behaviour that I want. But, not being familiar enough with x86, I don't know how to control C's automatic stack manipulation, where I ought to be aiming the jump, etc. (It looks to me like it extends the stack after the start of a function... so I need to either shrink the stack before the tailcall, which would be a bit unnecessary, or somehow aim a jump at the start of the target's body after the stack manipulation, which sounds pretty much impossible edit: for a statically-unknown target ..?)
So:
- can this be done?
- can it be done safely?
- is it at all sensible?
I should repeat that I am aware that it is possible to implement tail call elimination that works in C (that is, of my high-level language, not of C), I just want to add a compiler-and-platform-specific optimisation, to help it be be more or less indistinguishable from the "real thing". And I mainly want to use C because I'm very confident with it, whereas I've never used LLVM and have only the barest reading-knowledge of x86, so I was hoping to skip that step and get output working. I know I will have to learn both properly in the future at some point, but ...
(i.e. I'm asking this because the idea is stuck in my head and it bugs me that there might be a Better Way To Do It(tm), not because I need the solution to get a working backend.)