Difference in inlining functions by compiler or linker?

Question

I am wondering whether there is any difference between inlining functions on a linker level or compiler level in terms of execution speed?

e.g. if I have all my functions in .cpp files and rely on the linker to do inlining, will this inlining potentially be less efficient than say defining some functions in the headers for selected inlining on the compiler level or unity builds without any linking and all inlining done by the compiler?

If the linker is just as efficient, why would one then still bother inlining functions explicitly on the compiler level? Is that just for convenience, say there is just a one line constructor hence one can't be bothered with a .cpp file?

I suppose this might depend on the compiler, in which case I would be most interested in Visual C++ (Windows) and gcc (Linux).

Thanks

@unapersson: Visual C++ has so-called link-time code generation that seems to be able to do just about anything while emitting code. — sharptooth, May 10 '11 at 10:40
The linker is collecting all the modules and then calls up the compiler again to finish the code generation. That allows inlining between .cpp files, among other things. — Bo Persson, May 10 '11 at 11:08
Linkers don't inline code, they don't know how to *delete* machine code produced by the compiler. /LTCG serves a very different purpose, it *adds* code to provide instrumentation data to optimize the executable image layout. That code is temporary. — Hans Passant, May 10 '11 at 12:57
@Hans Passant: http://msdn.microsoft.com/en-us/library/xbf3tbeh(v=vs.80).aspx says that among other things does: *Cross-module inlining*. At any rate, there are other linkers that do it, so stating that *Linkers don't inline code* is an overstatement (in case you want to consider that in the previous case, the linker does not really *optimize* but rather calls the compiler to optimize). — David Rodríguez - dribeas, May 10 '11 at 13:25

score 3 · Accepted Answer · edited May 23 '17 at 11:55

3

The general rule is all else being equal the closer to execution (compiling->linking->(maybe JIT)->execution) the optimization is done the more data the optimizer has and the better optimization it can perform. So unless the optimizer is dumb you should expect better results when inlining is done by the linker - the linker will know more about the invokation context and do better optimization.

edited May 23 '17 at 11:55

Community

1
1

answered May 10 '11 at 10:35

sharptooth

167,383
100
513
979

Good answer … but as I understood it, the question is precisely whether all other things are equal. – Konrad Rudolph May 10 '11 at 10:36
@Konrad Rudolph: Well, true. And also the toolchain can be buggy, so one has to test it on his toolchain to be sure. – sharptooth May 10 '11 at 10:38
1

I definitely don't agree with this. Static compilers and linkers have vastly more time in which to do their work than JIT or dynamic optimization, which makes the scope of their efforts much larger. – Puppy May 10 '11 at 10:43
@DeadMG: That would cancel "all else being equal", won't it? – sharptooth May 10 '11 at 10:54
I am not sure that this hierarchy actually holds... Consider a variable that is only used as argument to a function, and that the actual implementation of the function completely ignores: `int x = complex_calculation(); foo( x );`, and `void foo(int x) { std::cout << "Hi!"; }`. The compiler cannot possibly know whether `x` is used inside `foo` (unless it can inline it itself), and that means that it will *create* the variable and it will *call* `complex_calculation`. Even if the linker inlines `foo`, I don't think that the linker can actually remove those costs after the fact. – David Rodríguez - dribeas May 10 '11 at 11:12
1

@David Rodríguez - dribeas: Technically it could if the intermediate data it uses to generate code stores enough to detect that. – sharptooth May 10 '11 at 11:27
1

@sharptooth: The point I was trying to make is that there are different types of optimizations that make sense at different points in time. To be able to optimize that away, the linker would have to contain a full blown code optimizer (not just inliner, i.e. not just the ability to move code around, but actually replace existing code). On the opposite end, a JIT has information on actual usage patterns that can allow it to optimize in ways that would be impossible at compile time (or would require the output of a profiler to be precise) – David Rodríguez - dribeas May 10 '11 at 13:20
@David Rodríguez - dribeas: In fact link-time code generation in Visual C++ has a lot of data to work with - it's not a set of plain old .obj files, it's some inflated data from which VC++ generates machine code right before linkage - it's a phase separate from compilation - http://msdn.microsoft.com/en-us/magazine/cc301698.aspx – sharptooth May 10 '11 at 13:23
@sharptooth: Then all else never is equal, making your answer completely irrelevant. You have to factor in the real circumstances around such things. – Puppy May 10 '11 at 18:06

score 3 · Answer 2 · answered May 10 '11 at 10:54

Generally, by the time the linker is run, your source has already been compiled into machine code. The linkers job is to take all the code fragments and link then together (possibly fixing addresses along the way). In such a case, there is no room for performing inlining.

But all is not lost. Gcc does provide a mechanism for link time optimization (using the -flto) option when compiling and linking. This causes gcc to produce a byte code that can then be compiled and linked by the linker into a single executable. Since the byte code contains more information than optimized machine code. The linker can now perform radical optimization on the whole codebase. Something that the compiler cannot do.

See here for more details on gcc. Not to sure about VC++ though.

The relevant links for VC++ are [/LTCG](http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx) and [/GL](http://msdn.microsoft.com/en-us/library/0zza0de8.aspx). — ildjarn, May 10 '11 at 11:15

Cory Nelson · Answer 3 · 2011-05-10T13:05:08.973

2

Inlining is normally performed within a single translation unit (.cpp file). When you call functions in another file, they’re never inlined.

Link Time Optimization (LTO) changes this, allowing inlining to work across translation units. It should always be equal or better (sometimes very very significantly) to regular linking in terms of how efficient the generated code is.

The reason both options are still available is that LTO can take a large amount of RAM and CPU – I’ve had VC++ take several minutes on linking a large C++ project before. Sometimes it’s not worth it to enable until you ship. You could also run out of address space with a large enough project, as it has to load all that bytecode into RAM.

For writing efficient code, nothing changes – all the same rules apply with LTO. It is potentially more efficient to explicitly define an inline function in a header file versus depending on LTO to inline it. The inline keyword only provides a hint so there’s no guarantee, but it might nudge it into being inlined where normally (with or without LTO) it wouldn’t be.

edited May 10 '11 at 13:05

answered May 10 '11 at 11:14

Cory Nelson

29,236
5
72
110

(review) Well phrased answer for a first answer. Introduce LTO with its full-name the first time you use it, especially since the question didn't define LTO, if your answer is accepted people will probably read it right after the question. – Hassan Syed May 10 '11 at 12:23
Thanks for the tip! I'll remember that next time I answer. LTO is of course Link Time Optimization. – Cory Nelson May 10 '11 at 12:40
Sorry I am a bit confused. On the one hand you say LTO is as least as efficient as explicit inlining, on the other hand you say explicit inlining might nudge it to being inlined (and hence I assume potentially more efficient) when LTO wouldn't inline it? – Cookie May 10 '11 at 12:49
LTO is at least as efficient as "regular linking" (non-LTO). LTO or not, explicit inlining is potentially more efficient than letting the compiler choose what to inline. – Cory Nelson May 10 '11 at 13:04
So does that mean that in terms of optimization and final execution speed one should try to inline rather more than less? E.g. for small classes (e.g. below 50 lines of code), write everything in the class definition of the header file? If one has a lot of those headers included in multiple cpp files, does this additional size of each compilation unit get reduced again by the linker? – Cookie May 10 '11 at 13:59
I usually inline functions that have <=5 lines of code, but there’s not really any fool-proof rule – it requires profiling to be sure. Generally the compiler will make good decisions for you, but if it supports Profile Guided Optimization, that can help it better decide which ones need to be inlined. VC++/GCC both support PGO. I’m not sure how well compilers will merge identical functions from multiple TUs. – Cory Nelson May 10 '11 at 14:46

score 0 · Answer 4 · answered May 10 '11 at 10:56

If the function is inlined, there would be no difference.

I believe the main reason for having inline functions defined in the headers is history. Another is portability. Until resently most compilers did not do link time code generation, so it having the functions in the headers was a necessity. That of course affects code bases started on more than a couple of years ago.

Also, if you still target some compilers that don't support link time code generation, you dont have a choice.

As an aside, I have in one case been forced to add a pragma to ask one specific compiler not to inline an init() function defined in one .cpp file, but potentially called from many places.

Difference in inlining functions by compiler or linker?

4 Answers4