What are guiding principles of expansion of callee inside the caller (Inlining - Compiler Optimization)

Question

My understanding is that compilers follow certain semantics that decide whether or not a function should be expanded inline. for example, if the callee unconditionally (no if/élse-if to return) returns a value, it may be expanded in caller itself. Similarly, function call overhead can also guide this expansion.(I may be completely wrong)

Similarly, the hardware parameters like cache-usage may also play a role in expansion.

As a programmer, I want to understand these semantics and the algorithms which guide inline expansion. Ultimately, I should be able to write(or recognize) a code that surely will be inlined(not-inlined). I don't mean to override compiler or that I think I would be able to write a code better than compiler itself. The question is rather to understand internals of the compilers.

EDIT: Since I use gcc/g++ in my work, we can limit the scope to these two alone. Though, I was of opinion that there will be several things common across compilers in this context.

You do understand that different compilers handle it differently right? Some might skip a function if its virtual, some might not. Most do recursive inlining, some don't. The question needs to be narrows down to "how does a specific compiler (example gcc) inline functions" or something otherwise its impossible to answer. Take a look at: https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Optimize-Options.html — , Jul 25 '15 at 06:23
@Wimmel: I can disassemble my code and see if the caller has the implementation of callee. Don't know why did you think it was duplicate. — ultimate cause, Jul 25 '15 at 06:28
You are going down the route of micro-optimisation, Not a good starting point as one loses sight of the bigger picture — Ed Heal, Jul 25 '15 at 06:29
@CamelToe: I actually mean to understand general principles, which I thought would be common in a language. Let us say C++. Please suggest. — ultimate cause, Jul 25 '15 at 06:30
"Ultimately, I should be able to write(or recognize) a code that surely will be inlined(not-inlined)". Arguably not a very useful skill to have. If you want to know what the compiler does with a particular function then look at the generated assembly. If you want to control the inline then most compilers have ways to force inline or no-inline. So what is the point of being able to eye-ball a function and saying that it will or will not be inlined? Just let the compiler do its job. — kaylum, Jul 25 '15 at 06:35
@CamelToe: GCC 4.1 is obsolete, you'll better cite the latest GCC documentation... — Basile Starynkevitch, Jul 25 '15 at 06:40
@ Alan Au: When I asked this I knew that this is NOT straight forward for someone to answer it without burning hours of time explaining the stuff. However, I hoped that there will surely be few guiding principles. And I can pick up from there to dig deep to the level where I think it is "enough" for me. :) — ultimate cause, Jul 25 '15 at 06:41
@RIPUNJAYTRIPATHI: there are no guiding principles, and you should not spend your time guessing them. Trust the compiler's heuristics on inlining decision. Please **edit your question** to explain **why** you are asking this. — Basile Starynkevitch, Jul 25 '15 at 06:45
"Ultimately, I should be able to write code that surely will be inlined" – you shouldn't. If the compiler doesn't inline a function, it has a very good reason for it (for example to save cache). You don't achieve the ultimate performance improvement by writing code that always inlines; inlining is not a magic universal performance boost technique. You achieve the best results if you are honest to the compiler, don't lie to it and let it do its decisions based on the code you intended to write in the first place. — The Paramagnetic Croissant, Jul 25 '15 at 07:09
@The Paramagnetic Croissant : Exactly, it is not magic. When you said "to save cache" did it occur to your mind that there are other things as well which make it a candidate to be inlined? — ultimate cause, Jul 25 '15 at 07:17
@RIPUNJAYTRIPATHI Yeah, but I don't see how that's relevant at all. I didn't assert that the only reason why a function isn't inlined is cache-friendliness. I asserted that should the compiler decide not to inline a function, it has a good reason behind its decision. — The Paramagnetic Croissant, Jul 25 '15 at 07:19
Understanding all of GCC takes a life time. Download its source code and dive into it. — Basile Starynkevitch, Jul 25 '15 at 07:47

score 6 · Answer 1 · edited Apr 12 '17 at 07:31

You don't need to understand the inlining (or other optimizations) criteria, because by definition (assuming that the optimizing compiler is not buggy on that respect), an inlined code should behave the same as a non-inlined code.

Your first example (callee unconditionally returning a value) is in practice certainly wrong, in the sense that several compilers are able to inline conditional returns.

For example, consider this f.c file:

static int fact (int n) {
  if (n <= 0) return 1;
  else
    return n * fact (n - 1);
}

int foo () {
  return fact (10);
}

Compile it with gcc -O3 -fverbose-asm -S f.c; the resulting f.s assembly file contains only one function (foo), the fact function has completely gone, and the fact(10) has been inlined (recursively) and replaced (constant folding) by 3628800.

With GCC -current version is GCC 5.2 in july 2015-, assuming you ask it to optimize (e.g. compile with gcc -O2 or g++ -O2 or -O3) the inlining decision is not easy to understand. The compiler would very probably make inlining decisions better than what you can do. There are many internal heuristics guiding it (so no simple few guiding principles, but some heuristics to inline, other to avoid inlining, and probably some meta-heuristics to choose). Read about optimize options (-finline-limit=...), function attributes.

You might use the always_inline and gnu_inline and noinline (and also noclone) function attributes, but I don't recommend doing that in general.

^{you could disable inlining with noinline but very often the resulting code would be slower. So don't do that...}

The key point is that the compiler is better optimizing and inlining than what you reasonably can, so trust it to inline and optimize well.

Optimizing compilers (see also this) can (and do) inline functions even without you knowing that, e.g. they are sometimes inlining functions not marked inline or not inlining some functions marked inline.

So no, you don't want to "understand these semantics and the algorithms which guide inline expansion", they are too difficult ... and vary from one compiler to another (even one version to another). If you really want to understand why GCC is inlining (this means spending months of work, and I believe you should not lose your time on that), use -fdump-tree-all and other dump flags, instrument the compiler using MELT -which I am developing-, dive into the source code (since GCC is a free software).

^{You'll need more than your life time, or at least several dozens of years, to understand all of GCC (more than ten millions lines of source code) and how it is optimizing. By the time you understood something, the GCC community would have worked on new optimizations, etc...}

BTW, if you compile and link an entire application or library with gcc -flto -O3 (e.g. with make CC='gcc -flto -O3') the GCC compiler would do link-time optimization and inline some calls accross translation units (e.g. in f1.c you call foo defined in f2.c, and some of the calls to foo in f1.c would got inlined).

The compiler optimizations do take into account cache sizes (for deciding about inlining, unrolling, register allocation & spilling and other optimizations), in particular when compiling with gcc -mtune=native -O3

Unless you force the compiler (e.g. by using noinline or alwaysinline function attributes in GCC, which is often wrong and would produce worse code), you'll never be able in practice to guess that a given code chunk would certainly be inlined. Even people working on GCC middle end optimizations cannot guess that reliably! So you cannot reliably understand -and predict- the compiler behavior in practice, hence don't even waste your time to try that.

Look also into MILEPOST GCC; by using machine learning techniques to tune some GCC parameters, they have been able to sometimes get astonishing performance improvements, but they certainly cannot explain or understand them.

If you need to understand your particular compiler while coding some C or C++, your code is probably wrong (e.g. probably could have some undefined behavior). You should code against some language specification (either the C11 or C++14 standards, or the particular GCC dialect e.g. -std=gnu11 documented and implemented by your GCC compiler) and trust your compiler to be faithful w.r.t. that specification.

While I never meant that conditional returns will outrightly prevent inlining, the example you presented was most likely inlined due to recursion being put in very small function. — ultimate cause, Jul 25 '15 at 07:03

score 3 · Answer 2 · answered Jul 25 '15 at 06:38

Inlining is like copy-paste. There aren't so many gotchas that will prevent it from working, but it should be used judiciously. If it gets out of control, the program will become bloated.

Most compilers use a heuristic based on the "size" of the function. Since this is usually before any code generation pass, the number of AST nodes may be used as a proxy for size. A function that includes inlined calls needs to include them it its own size, or inlining can go totally out of control. However, AST nodes that will not generate instructions should not prevent inlining. It can be difficult to tell what will generate a "move" instruction and what will generate nothing.

Since modern C++ tends to involve lots of functions that perform conceptual rearrangement with no underlying instructions, the difficulty is telling the difference between no instructions, "just a few" moves, and enough move instructions to cause a problem. The only way to tell for a particular instance is to run the program in a debugger and/or read the disassembly.

Mostly in typical C++ code, we just assume that the inliner is working hard enough. For performance-critical situations, you can't just eyeball it or assume that anything is working optimally. Detailed performance analysis at the disassembly level is essential.

… I don't mean to imply that the size heuristic would be the only one. See Basile's answer for more. — Potatoswatter, Jul 25 '15 at 07:33

What are guiding principles of expansion of callee inside the caller (Inlining - Compiler Optimization)

2 Answers2