In a previous question, I was looking at a simple code example, looking at Tail Call Optimisation.
For that question, I ran into the issue that GCC applied too much optimisation to demonstrate the principle that was meant to be shown by this example code, and eliminated the loop/recursion.
How can I get GCC to compile with tail-call-optimisation enabled, but with as little other optimisation as possible?
I am currently attempting to compile with the following options:
gcc -O0 -foptimize-sibling-calls -ftree-tail-merge tail_recursive.c -o tail_recursive.c.exe
gcc -S -O0 -foptimize-sibling-calls -ftree-tail-merge tail_recursive.c
The recursive function in question is:
long recursive_tco(long loop, long current) {
return (loop>0) ? recursive_tco(loop-1, current+1): current;
}
and the assembler produced by this is:
.globl recursive_tco
.def recursive_tco; .scl 2; .type 32; .endef
.seh_proc recursive_tco
recursive_tco:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $32, %rsp
.seh_stackalloc 32
.seh_endprologue
movq %rcx, 16(%rbp)
movq %rdx, 24(%rbp)
cmpq $0, 16(%rbp)
jle .L2
movq 24(%rbp), %rax
leaq 1(%rax), %rdx
movq 16(%rbp), %rax
subq $1, %rax
movq %rax, %rcx
call recursive_tco
jmp .L3
(note that it is calling itself, instead of a conditional jmp, which is what I would expect)
What options am I missing to get gcc to perform tail call optimisation without eliminating the example code?
Obviously on real code, I would just enable -O2
, and I prefer writing loops instead of recursion in C.
Compilation on gcc version 4.9.3 (GCC)
Edit:
changing the function to:
__attribute__ ((noinline)) long recursive_tco(long loop, long current) {
return (loop>0) ? recursive_tco(loop-1, current+1): current;
}
results in the following at -O2:
recursive_tco:
.seh_endprologue
movq %rdx, %rax
leaq (%rcx,%rdx), %rdx
testq %rcx, %rcx
cmovg %rdx, %rax
ret
If I am reading the assembler right, it seems to do the job for the tail-recursive function, when compiling with gcc -O1 -foptimize-sibling-calls -ftree-tail-merge
:
recursive_tco:
.seh_endprologue
movq %rdx, %rax
testq %rcx, %rcx
jle .L2
movq %rcx, %r8
.L3:
subq $1, %r8
jne .L3
leaq (%rcx,%rax), %rax
.L2:
rep ret
Unfortunately, it is also managing to optimise the non-tail recursive function in a similar way, which is not what I was expecting:
recursive:
.seh_endprologue
testq %rcx, %rcx
jle .L4
movq %rcx, %rdx
.L3:
subq $1, %rdx
jne .L3
movq %rcx, %rax
ret
.L4:
movl $0, %eax
ret
I expect there are some further optimisations that are being done, and I need to provide some extra flags to disable them. Any suggestions?