Apart from the already said point that an inline
function need not actually be inlined (and many functions without inline
are inlined by modern compilers), it's also entirely conceivable to inline a call through a function pointer. Example:
#include <iostream>
int foo(int (*fun)(int), int x) {
return fun(x);
}
int succ(int n) {
return n+1;
}
int main() {
int c=0;
for (int i=0; i<10000; ++i) {
c += foo(succ, i);
}
std::cout << c << std::endl;
}
Here, foo(succ, i)
could as a whole be inlined to just i+1
. And indeed that seems to happen†: g++ -O3 -S
produces code for the foo
and succ
functions
_Z3fooPFiiEi:
.LFB998:
.cfi_startproc
movq %rdi, %rax
movl %esi, %edi
jmp *%rax
.cfi_endproc
.LFE998:
.size _Z3fooPFiiEi, .-_Z3fooPFiiEi
.p2align 4,,15
.globl _Z4succi
.type _Z4succi, @function
_Z4succi:
.LFB999:
.cfi_startproc
leal 1(%rdi), %eax
ret
.cfi_endproc
But then it generates code for main
which never refers to either of these, instead just includes a new specialised _GLOBAL__sub_I__Z3fooPFiiEi
:
.LFE999:
.size _Z4succi, .-_Z4succi
.section .text.startup,"ax",@progbits
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB1000:
.cfi_startproc
movdqa .LC1(%rip), %xmm4
xorl %eax, %eax
pxor %xmm1, %xmm1
movdqa .LC0(%rip), %xmm0
movdqa .LC2(%rip), %xmm3
jmp .L5
.p2align 4,,10
.p2align 3
.L8:
movdqa %xmm2, %xmm0
.L5:
movdqa %xmm0, %xmm2
addl $1, %eax
paddd %xmm3, %xmm0
cmpl $2500, %eax
paddd %xmm0, %xmm1
paddd %xmm4, %xmm2
jne .L8
movdqa %xmm1, %xmm5
subq $24, %rsp
.cfi_def_cfa_offset 32
movl $_ZSt4cout, %edi
psrldq $8, %xmm5
paddd %xmm5, %xmm1
movdqa %xmm1, %xmm6
psrldq $4, %xmm6
paddd %xmm6, %xmm1
movdqa %xmm1, %xmm7
movd %xmm7, 12(%rsp)
movl 12(%rsp), %esi
call _ZNSolsEi
movq %rax, %rdi
call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
xorl %eax, %eax
addq $24, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1000:
.size main, .-main
.p2align 4,,15
.type _GLOBAL__sub_I__Z3fooPFiiEi, @function
_GLOBAL__sub_I__Z3fooPFiiEi:
.LFB1007:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
addq $8, %rsp
.cfi_def_cfa_offset 8
jmp __cxa_atexit
.cfi_endproc
.LFE1007:
.size _GLOBAL__sub_I__Z3fooPFiiEi, .-_GLOBAL__sub_I__Z3fooPFiiEi
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I__Z3fooPFiiEi
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
So in this case the actual program does not even contain a function pointer pointing to succ
– the compiler has found out that this pointer would always refer to the same function anyway, and was therefore able to eliminate the entire thing without changing the behaviour. This can improve performance a lot, when you often call small functions through function pointers. Which is quite a widespread technique in functional languages; compilers for languages like O'Caml and Haskell make great use of this kind of optimisation.
†Disclaimer: my assembly skills are close to nonexistent. I might well be talking rubbish here.