17

Will the C++ linker automatically inline "pass-through" functions, which are NOT defined in the header, and NOT explicitly requested to be "inlined" through the inline keyword?

For example, the following happens so often, and should always benefit from "inlining", that it seems every compiler vendor should have "automatically" handled it through "inlining" through the linker (in those cases where it is possible):

//FILE: MyA.hpp
class MyA
{
  public:
    int foo(void) const;
};

//FILE: MyB.hpp
class MyB
{
  private:
    MyA my_a_;
  public:
    int foo(void) const;
};

//FILE: MyB.cpp
// PLEASE SAY THIS FUNCTION IS "INLINED" BY THE LINKER, EVEN THOUGH
// IT WAS NOT IMPLICITLY/EXPLICITLY REQUESTED TO BE "INLINED"?
int MyB::foo(void)
{
  return my_a_.foo();
}

I'm aware the MSVS linker will perform some "inlining" through its Link Time Code Generation (LTGCC), and that the GCC toolchain also supports Link Time Optimization (LTO) (see: Can the linker inline functions?).

Further, I'm aware that there are cases where this cannot be "inlined", such as when the implementation is not "available" to the linker (e.g., across shared library boundaries, where separate linking occurs).

However, if this is code is linked into a single executable that does not cross DLL/shared-lib boundaries, I'd expect the compiler/linker vendor to automatically inline the function, as a simple-and-obvious optimization (benefiting both performance-and-size)?

Are my hopes too naive?

Community
  • 1
  • 1
charley
  • 5,913
  • 1
  • 33
  • 58
  • I don't know whether inlining here takes place or not( I am positive it can), but it's not an ***obvious*** omtimization. Too much inlining can actually ***slow*** down your program – Armen Tsirunyan Aug 28 '11 at 21:18
  • 1
    In this case, though, isn't the result of `inline` *always* smaller-and-faster? Is this not a simple "function-call-unwind" that does not force paging in the text segment, and never adds overhead from the stack? (It's just an "unwind" of a nested function call that *must* still be called from the parent context.) – charley Aug 28 '11 at 21:21
  • 1
    @charley if the instructions it takes to call a function are larger than the function then yes, it will always be smaller and faster. Which is why the compiler will almost definitely inline this if you have optimisations on. – Seth Carnegie Aug 28 '11 at 21:23
  • 1
    Are you asking about a particular compiler and linker? Do you mean optimizations without any command line arguments, such as -O? By default, you don't even get tail recursion optimization or intelligent handling of register spill if you don't pass that in. – Conspicuous Compiler Aug 28 '11 at 21:24
  • If you can take a look at the assembly generated by a calling function. – Charles Beattie Aug 28 '11 at 21:28
  • @Conspicuous Compiler, no, I'm not asking about a particular compiler, and appreciate most of today's C++ compilers are pretty smart, and employ their own approaches to optimization. And, I accept that if you explicitly turn optimization "off", then this shouldn't/wouldn't be "inlined". Specifically, if a given library has *massive* amounts of these "pass-through" functions (e.g., heavy math, or hardware-device interface), is it necessary to make explicit effort to re-factor the code to support `inline` (as opposed to just trusting the linker optimization)? – charley Aug 28 '11 at 21:32
  • 3
    Why don't you just define those kinds of functions in the header, though? It's hard to see what you gain by putting them in a different TU. – jalf Aug 28 '11 at 21:33
  • 2
    @jalf, I tend to separate declaration and definition, even in simple cases, because Intellisense will update for about a minute after altering a header. This prevents the use of the status bar for better things. – John Aug 28 '11 at 22:02
  • 1
    @John: even so, it is essentially duplicate code (making it harder to maintain and more error-prone), and it makes the linker work harder and makes it harder for the compiler to optimize your code. Is it worth it? Apart from everything else, what kind of changes can you make to such a simple pass-through function without altering the declaration? – jalf Aug 29 '11 at 08:59
  • @jalf, yes, implementation as "`inline`" in the header might be more obvious. But, that would "clutter" the header (there are a lot of these pass-throughs), and it means we must "touch" lots of already-existing-libraries (we were hoping to avoid that). – charley Aug 29 '11 at 13:16

8 Answers8

18

Here's a quick test of your example (with a MyA::foo() implementation that simply returns 42). All these tests were with 32-bit targets - it's possible that different results might be seen with 64-bit targets. It's also worth noting that using the -flto option (GCC) or the /GL option (MSVC) results in full optimization - wherever MyB::foo() is called, it's simply replaced with 42.

With GCC (MinGW 4.5.1):

gcc -g -O3 -o test.exe myb.cpp mya.cpp test.cpp

the call to MyB::foo() was not optimized away. MyB::foo() itself was slightly optimized to:

Dump of assembler code for function MyB::foo() const:
   0x00401350 <+0>:     push   %ebp
   0x00401351 <+1>:     mov    %esp,%ebp
   0x00401353 <+3>:     sub    $0x8,%esp
=> 0x00401356 <+6>:     leave
   0x00401357 <+7>:     jmp    0x401360 <MyA::foo() const>

Which is the entry prologue is left in place, but immediately undone (the leave instruction) and the code jumps to MyA::foo() to do the real work. However, this is an optimization that the compiler (not the linker) is doing since it realizes that MyB::foo() is simply returning whatever MyA::foo() returns. I'm not sure why the prologue is left in.

MSVC 16 (from VS 2010) handled things a little differently:

MyB::foo() ended up as two jumps - one to a 'thunk' of some sort:

0:000> u myb!MyB::foo
myb!MyB::foo:
001a1030 e9d0ffffff      jmp     myb!ILT+0(?fooMyAQBEHXZ) (001a1005)

And the thunk simply jumped to MyA::foo():

myb!ILT+0(?fooMyAQBEHXZ):
001a1005 e936000000      jmp     myb!MyA::foo (001a1040)

Again - this was largely (entirely?) performed by the compiler, since if you look at the object code produced before linking, MyB::foo() is compiled to a plain jump to MyA::foo().

So to boil all this down - it looks like without explicitly invoking LTO/LTCG, linkers today are unwilling/unable to perform the optimization of removing the call to MyB::foo() altogether, even if MyB::foo() is a simple jump to MyA::foo().

So I guess if you want link time optimization, use the -flto (for GCC) or /GL (for the MSVC compiler) and /LTCG (for the MSVC linker) options.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    +1 for taking the effort to show assembly and clearly explain what happens. This is pretty useful..I wanted to check what happens to my code by checking the assembly too. How was the above dump generated? – cppcoder Aug 28 '11 at 22:11
  • +1 also -- thanks *VERY MUCH* for this. Interesting that the compiler can reduce this (makes sense), especially in the case where the linker can't/won't reduce it... Would be very nice to see these optimizations be added to the vendor compiler/linker chain in the future... – charley Aug 28 '11 at 22:16
  • 1
    @charley: I think the compiler vendors consider that the optimizations have been added - but that you have to use the link time optimization options to get them. I think that's a reasonable point of view. – Michael Burr Aug 28 '11 at 22:27
  • 1
    @srikrish: for these dumps I used the `disassem` command in GDB (for the GCC build) and the `u` command in cdb (for the MSVC build). Also useful is the `-S` option for GCC which generates the assembly output instead of creating an object file when compiling, and the similar `/FAsc` option for MSVC. – Michael Burr Aug 28 '11 at 22:30
13

Is it common ? Yes, for mainstream compilers.

Is it automatic ? Generally not. MSVC requires the /GL switch, gcc and clang the -flto flag.

How does it work ? (gcc only)

The traditional linker used in the gcc toolchain is ld, and it's kind of dumb. Therefore, and it might be surprising, link-time optimization is not performed by the linker in the gcc toolchain.

Gcc has a specific intermediate representation on which the optimizations are performed that is language agnostic: GIMPLE. When compiling a source file with -flto (which activates the LTO), it saves the intermediate representation in a specific section of the object file.

When invoking the linker driver (note: NOT the linker directly) with -flto, the driver will read those specific sections, bundle them together into a big chunk, and feed this bundle to the compiler. The compiler reapplies the optimizations as it usually does for a regular compilation (constant propagation, inlining, and this may expose new opportunities for dead code elimination, loop transformations, etc...) and produces a single big object file.

This big object file is finally fed to the regular linker of the toolchain (probably ld, unless you're experimenting with gold), which performes its linker magic.

Clang works similarly, and I surmise that MSVC uses a similar trick.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
8

It depends. Most compilers (linkers, really) support this kind of optimizations. But in order for it to be done, the entire code-generation phase pretty much has to be deferred to link-time. MSVC calls the option link-time code generation (LTCG), and it is by default enabled in release builds, IIRC.

GCC has a similar option, under a different name, but I can't remember which -O levels, if any, enables it, or if it has to be enabled explicitly.

However, "traditionally", C++ compilers have compiled a single translation unit in isolation, after which the linker has merely tied up the loose ends, ensuring that when translation unit A calls a function defined in translation unit B, the correct function address is looked up and inserted into the calling code.

if you follow this model, then it is impossible to inline functions defined in another translation unit.

It is not just some "simple" optimization that can be done "on the fly", like, say, loop unrolling. It requires the linker and compiler to cooperate, because the linker will have to take over some of the work normally done by the compiler.

Note that the compiler will gladly inline functions that are not marked with the inline keyword. But only if it is aware of how the function is defined at the site where it is called. If it can't see the definition, then it can't inline the call. That is why you normally define such small trivial "intended-to-be-inlined" functions in headers, making their definitions visible to all callers.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • GCC has the `fwhole-program` switch, which I don't think is defined by any O levels. – GManNickG Aug 28 '11 at 21:51
  • The `-fwhole-program` option only works in limited situations, and will not work with the example set of modules (unless maybe if functions were marked with the right attributes that would defeat the purpose of using `-fwhole-program` in this case). – Michael Burr Aug 28 '11 at 22:08
  • 1
    In modern GCC, you have to compile every stage with `-flto` (link-time optimization). The linking phase slows down dramatically, but I often get 20% smaller executables like that. – Kerrek SB Aug 28 '11 at 22:53
  • @Kerrek: my understanding is that GCC's `-flto ` is implemented by having the object files carry around a partially processed version of the source, then at link time, the linker concatenates all that stuff and recompiles it with with `-fwhole-program` - at least that's the general idea, if not quite precise. – Michael Burr Aug 28 '11 at 22:59
  • 1
    @Michael: Yes, that's right - the object files all get amended with GIMPLE, a GCC type of bytecode, which then gets compiled at the "link" stage. – Kerrek SB Aug 28 '11 at 23:01
5

Inlining is not a linker function.

The toolchains that support whole program optimization (cross-TU inlining) do so by not actually compiling anything, just parsing and storing an intermediate representation of the code, at compile time. And then the linker invokes the compiler, which does the actual inlining.

This is not done by default, you have to request it explicitly with appropriate command-line options to the compiler and linker.

One reason it is not and should not be default, is that it increases dependency-based rebuild times dramatically (sometimes by several orders of magnitude, depending on code organization).

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
0

Yes, any decent compiler is fully capable of inlining that function if you have the proper optimisation flags set and the compiler deems it a performance bonus.

If you really want to know, add a breakpoint before your function is called, compile your program, and look at the assembly. It will be very clear if you do that.

Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • Capable -- yes. But, this is commonly "done", right? This is the default behavior in `gcc` and `MSVS`? (Please say "Yes"?) – charley Aug 28 '11 at 21:24
  • @charley I am no compiler writer, so all I can say is "it would be stupid not to" unless the compiler has a good reason. Compilers like MSVC++ and gcc are _extremely_ smart. So I would say yes, but I can't prove it. Just compile your program and look at the assembly. – Seth Carnegie Aug 28 '11 at 21:26
  • 1
    It doesn't matter how smart the compiler is, if it compiles each translation unit in isolation, it won't be able to inline across them. – jalf Aug 28 '11 at 21:29
  • @jalf yes, but there was no reason to assume he's making the compiler do that, and they don't do that automatically by themselves I assume. – Seth Carnegie Aug 28 '11 at 21:30
  • 1
    If you read the question, then there is **every** reason to assume it *because that is exactly what he is asking*. – jalf Aug 28 '11 at 21:36
  • 1
    @jalf he never said he was making the compiler not do link time code generation, he only said his function was defined in a .cpp file. I may have misread as a normally do though. – Seth Carnegie Aug 28 '11 at 21:38
  • 2
    @charley: The default for both MSVC and gcc compiler are no optimizations whatsoever. They need to be enabled on the command-line passed to the compiler. When you add an IDE such MSVS, you start getting project-level defaults different from the compiler defaults, and "Release Mode" in these IDEs will generally enable such things as Whole Program Optimization (LTCG). – Ben Voigt Aug 28 '11 at 22:08
0

Compiled code must be able to see the content of the function for a chance of inlining. The chance of this happening more can be done though the use of unity files and LTCG.

Charles Beattie
  • 5,739
  • 1
  • 29
  • 32
0

The inline keyword only acts as a guidance for the compiler to inline functions when doing optimization. In g++, the optimization levels -O2 and -O3 generate different levels of inlining. The g++ doc specifies the following : (i) If O2 is specified -finline-small-functions is turned ON.(ii) If O3 is specified -finline-functions is turned ON along with all options for O2. (iii) Then there is one more relevant options "no-default-inline" which will make member functions inline only if "inline" keyword is added.

Typically, the size of the functions (number of instructions in the assembly), if recursive calls are used determine whether inlining happens. There are plenty more options defined in the link below for g++:

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Please take a look and see which ones you are using, because ultimately the options you use determine whether your function is inlined.

cppcoder
  • 1,194
  • 4
  • 16
  • 30
-1

Here is my understanding of what the compiler will do with functions:

If the function definition is inside the class definition, and assuming no scenarios which prevent "inline-ing" the function, such as recursion, exist, the function will be "inline-d".

If the function definition is outside the class definition, the function will not be "inline-d" unless the function definition explicitly includes the inline keyword.

Here is an excerpt from Ivor Horton's Beginning Visual C++ 2010:

Inline Functions

With an inline function, the compiler tries to expand the code in the body of the function in place of a call to the function. This avoids much of the overhead of calling the function and, therefore, speeds up your code.

The compiler may not always be able to insert the code for a function inline (such as with recursive functions or functions for which you have obtained an address), but generally, it will work. It's best used for very short, simple functions, such as our Volume() in the CBox class, because such functions execute faster and inserting the body code does not significantly increase the size of the executable module.

With function definitions outside of the class definition, the compiler treats the functions as a normal function, and a call of the function will work in the usual way; however, it's also possible to tell the compiler that, if possible, you would like the function to be considered as inline. This is done by simply placing the keyword inline at the beginning of the function header. So, for this function, the definition would be as follows:

inline double CBox::Volume()
{
    return l * w * h;
}
  • 1
    What makes you think it won't be inlined if it's not in the class definition? That means that it can't inline non-member functions. – Seth Carnegie Aug 28 '11 at 21:28
  • Maybe I wasn't clear, and I could possibly be wrong, I'm only going by what I read in one book. But if you declare a non-member function and use the work inline in the function header, it will be inline. But I do not believe the compiler will automatically inline a function unless it is either declared and defined inside the class or unless the keyword inline is used. Maybe compiler optimizations might do this automatically, but if he wants to be certain the function is inline, it would be best to declare it because there is no guarantee the compiler will do it. –  Aug 28 '11 at 21:41
  • actually there is no guarantee the compiler will do it even if you _do_ have it declared as `inline`. So basically the compiler almost completely ignores that keyword. It does the equivalent of making it think twice about inlining it, but not much more. – Seth Carnegie Aug 28 '11 at 23:12
  • 1
    The`inline` keyword does not mean what you think it does. It tells the linker to accept multiple definitions of a symbol, so as long as they're all identical. That's it. The compiler is not required by the Standard to care about this keyword. It may inline without it, and _not_ inline _with_ it. That is, compilers may completely ignore the `inline` keyword for purposes of optimisation - and, indeed, most do. – underscore_d Oct 04 '16 at 19:58