Only enable tail-call-optimisation in GCC

Question

In a previous question, I was looking at a simple code example, looking at Tail Call Optimisation.

For that question, I ran into the issue that GCC applied too much optimisation to demonstrate the principle that was meant to be shown by this example code, and eliminated the loop/recursion.

How can I get GCC to compile with tail-call-optimisation enabled, but with as little other optimisation as possible?

I am currently attempting to compile with the following options:

gcc -O0 -foptimize-sibling-calls -ftree-tail-merge tail_recursive.c -o tail_recursive.c.exe
gcc -S -O0 -foptimize-sibling-calls -ftree-tail-merge tail_recursive.c

The recursive function in question is:

long recursive_tco(long loop, long current) {
  return (loop>0) ? recursive_tco(loop-1, current+1): current;
}

and the assembler produced by this is:

    .globl  recursive_tco
    .def    recursive_tco;  .scl    2;  .type   32; .endef
    .seh_proc   recursive_tco
recursive_tco:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $32, %rsp
    .seh_stackalloc 32
    .seh_endprologue
    movq    %rcx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    cmpq    $0, 16(%rbp)
    jle .L2
    movq    24(%rbp), %rax
    leaq    1(%rax), %rdx
    movq    16(%rbp), %rax
    subq    $1, %rax
    movq    %rax, %rcx
    call    recursive_tco
    jmp .L3

(note that it is calling itself, instead of a conditional jmp, which is what I would expect)

What options am I missing to get gcc to perform tail call optimisation without eliminating the example code?

Obviously on real code, I would just enable -O2, and I prefer writing loops instead of recursion in C.

Compilation on gcc version 4.9.3 (GCC)

Edit:

changing the function to:

__attribute__ ((noinline)) long recursive_tco(long loop, long current) {
  return (loop>0) ? recursive_tco(loop-1, current+1): current;
}

results in the following at -O2:

recursive_tco:
    .seh_endprologue
    movq    %rdx, %rax
    leaq    (%rcx,%rdx), %rdx
    testq   %rcx, %rcx
    cmovg   %rdx, %rax
    ret

If I am reading the assembler right, it seems to do the job for the tail-recursive function, when compiling with gcc -O1 -foptimize-sibling-calls -ftree-tail-merge:

recursive_tco:
    .seh_endprologue
    movq    %rdx, %rax
    testq   %rcx, %rcx
    jle .L2
    movq    %rcx, %r8
.L3:
    subq    $1, %r8
    jne .L3
    leaq    (%rcx,%rax), %rax
.L2:
    rep ret

Unfortunately, it is also managing to optimise the non-tail recursive function in a similar way, which is not what I was expecting:

recursive:
    .seh_endprologue
    testq   %rcx, %rcx
    jle .L4
    movq    %rcx, %rdx
.L3:
    subq    $1, %rdx
    jne .L3
    movq    %rcx, %rax
    ret
.L4:
    movl    $0, %eax
    ret

I expect there are some further optimisations that are being done, and I need to provide some extra flags to disable them. Any suggestions?

Possibly a duplicate: [How do I check if gcc is performing tail-recursion optimization?](http://stackoverflow.com/questions/490324/how-do-i-check-if-gcc-is-performing-tail-recursion-optimization) — Lundin, Feb 13 '17 at 14:31
@Lundin not in this case. I _know_ it isn't in this case, and it 'over optimises' with `-O2`. I'm looking for a way to get it to apply just TCO if possible. That question wants to know if it _was_ applied or not, and the answers are using `-O2` (where specified) which unfortunately doesn't match up with the needs here. I did find that question, and unfortunately could not get an answer from it. (Thanks for your answer on the other question though) — Baldrickk, Feb 13 '17 at 14:39
Hmm well that question suggests tail optimizations are performed at -O2 and -O3. I suppose you could try `__attribute__ ((noinline)) long recursive_tco(long loop, long current)` to prevent the inlining, then try out different optimzation levels. — Lundin, Feb 13 '17 at 14:42
@Lundin I'll try that, thanks. - the `-O0 -f...` was an attempt to only enable the tco, but if this works, then that is also a good solution. — Baldrickk, Feb 13 '17 at 14:44
I have seen this come up in C a few times recently, is this an artifact of some book, or some real world problem that is coming up more recently? It isn't a language level feature in C like it is in LISP... just curious, will take my answer off the air. — Grady Player, Feb 13 '17 at 15:11
As documented in many places, most specific optimization flags have no effect at -O0, you need at least -O1 or -Og to enable optimizations at all (and then you can disable specific stuff). — Marc Glisse, Feb 13 '17 at 15:57
@GradyPlayer in this case, it was a result of a team meeting at work where little things like this are discussed. I brought up the point that `return val ? recursive(val)+1 : 0` which was being used as an example wasn't actually tail-recursive, and it led to this. As noted, in C, I prefer loops to recursion. I had fun with recursion when I did work with Scheme though. — Baldrickk, Feb 13 '17 at 16:50
@MarcGlisse yes, `-O1 -foptimize-sibling-calls` does seem to get me closest to what I'm after, I just need to work out what I need to disable, or if it isn't possible at all to get this to compile as I want it. — Baldrickk, Feb 13 '17 at 16:53

Only enable tail-call-optimisation in GCC

0 Answers0