How deep do compilers inline functions?

Question

Say I have some functions, each of about two simple lines of code, and they call each other like this: A calls B calls C calls D ... calls K. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?

You could simply test and look at the assembly! Your compiler docs should tell you how to specify the inlining depth; I think it's something like 50 for GCC by default. — Kerrek SB, Sep 18 '11 at 17:12
I believe this should be compiler specific and you post no information of your compiler. — Alok Save, Sep 18 '11 at 17:13
Under MSVC you have partial control over this using `#pragma inline_depth` (http://msdn.microsoft.com/en-us/library/cx053bca.aspx) though I have had problems with it in certain situations(such as recursive inlining, which is meant to be possible, but never worked, ended up doing it manually) — Necrolis, Sep 18 '11 at 20:23

score 17 · Accepted Answer · edited Mar 25 '14 at 15:27

The question is not meaningful.

If you think about inlining, and its consequences, you'll realise it:

Avoids a function call (with all the register saving/frame adjustment)
Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)
Duplicates code (bloating the instruction cache and the executable size, among other things)

When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc -O3 means optimize for speed while -Oz means optimize for size, on inlining they have quasi opposite behaviors!

Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).

This means that a simple forwarding function:

int foo(int a, int b) { return foo(a, b, 3); }

is essentially "transparent" from the inlining point of view.

One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a static free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.

From this two examples we get a hunch of how the heuristics behave:

the less instructions the function have, the better for inling
the less often it is called, the better for inlining

After that, they are parameters you should be able to set to influence one way or another (MSVC as __force_inline which hints strongly at inling, gcc as they -finline-limit flag to "raise" the treshold on the instruction count, etc...)

On a tangent: do you know about partial inlining ?

It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.

For example:

void foo(Bar* x) {
  if (not x) { return; } // null pointer, pfff!

  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  foo(x);
  // DO 2
}

could get "optimized" as:

void foo@0(Bar* x) {
  // ... BIG BLOC OF STATEMENTS ...
}

void bar(Bar* x) {
  // DO 1
  if (x) { foo@0(x); }
  // DO 2
}

Of course, once again the heuristics for inlining apply, but they apply more discriminately!

And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.

I don't usually do this, but I think I should change the accepted answer. :) I didn't know about partial inlining and neither about inlining based on number of calls. Thanks for the details. — Paul Manta, Sep 18 '11 at 19:22

score 7 · Answer 2 · answered Sep 18 '11 at 17:13

7

I've seen compilers inline more than 5 functions deep. But at some point, it basically becomes a space-efficiency trade-off that the compiler makes. Every compiler is different in this aspect. Visual Studio is very conservative with inlining. GCC (under -O3) and the Intel Compiler love to inline...

answered Sep 18 '11 at 17:13

Mysticial

464,885
45
335
332

IIRC in gcc it depends on some approximate "instruction count" for the function being inlined (i.e. how long it actually is); nesting level does not play role from what I read in the docs (`-finline-limit` and friends), in the sense that function of length 10 will be inlined just the same as 5 + nested 5. – eudoxos Sep 18 '11 at 17:16
1

If the functions are called only once there is no reason to avoid inline. GCC will also inline aggressively if profile feedback says it should. – Zan Lynx Sep 18 '11 at 18:15
3

@Zan Lynx: That's mostly correct. There are some cases, where it's better to not inline. If the function is in a performance-critical loop and is rarely called (like a trap handler), then it's better to not inline it so to keep the code-size of the loop small. (this will sometimes allow you to use short-jumps instead of long-jumps) – Mysticial Sep 18 '11 at 18:18

How deep do compilers inline functions?

2 Answers2

Linked