Link-time optimization and inline

Question

In my experience, there's lot of code that explicitly uses inline functions, which comes at a tradeoff:

The code becomes less succinct and somewhat less maintainable.
Sometimes, inlining can greatly increase run-time performance.
Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.

The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete? Is it true that we don't need to consider inlining for most functions ourselves? What about functions that do always benefit from inlining, e.g., deg_to_rad(x)?

Clarification: I am not thinking about functions that are in the same translation-unit anyway, but about functions that should logically reside in different translation-units.

Update: I have often seen an opposition against "inline", and it was suggested obsolete. Personally, however, I do see explicitly inlined functions often: as functions defined in a class body.

Note that there's no such thing as "functions that always benefit from inlining". If your `deg_to_rad` is called many times in many different places in the code, it will greatly inflate the code size which can lead to caching/paging issues. — Oliver Charlesworth, Aug 12 '11 at 21:35
Declaring a function `inline` is pretty much a no-op. A good compiler will ignore the keyword for inlining decisions and make its own choice about whether to inline. — R.. GitHub STOP HELPING ICE, Aug 12 '11 at 21:40
@Oli, actually I chose the example b/c calling deg_to_rad would usually take more instructions (store, call, load) than just multiplying a float. — ccom, Aug 12 '11 at 22:01
@ccom: Huh? It's always up to the discretion of the compiler if it performs some optimization or not. And programmer annotated `inline`'s are pretty much ignored on any modern compiler, and rightly so. — GManNickG, Aug 12 '11 at 22:06
@GMan Have you tested that? From the assembly output I've looked at, it seems MSVC takes `inline` pretty seriously. — Crashworks, Aug 12 '11 at 22:14
`inline` has semantic as well as optimization effects. I don't remember exactly what they are, but it is definitely possible to construct a strictly conforming program that would trigger constraint violations if you removed all instances of the keyword. — zwol, Aug 12 '11 at 22:16
@Crash: [This says](http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx): "The insertion (called inline expansion or inlining) occurs only if the compiler's cost/benefit analysis show it to be profitable." What code are you testing? — GManNickG, Aug 12 '11 at 22:21
@Zack: It effectively says "ignore duplicates of this function", so you can have a definition of a function included in multiple translation units without error. (Note the context of this question is with optimization, though.) — GManNickG, Aug 12 '11 at 22:22
@Oli Charlesworth: You are assuming that the instruction count of the call is smaller than the instruction count of the function. inlining `int add(int x,int y) {return x+y;}` will always be beneficial as not only the cost of the call but the instruction count to make the call is higher than the cost of the function body. I think deg_rad() also falls into this category as it is very simple. — Martin York, Aug 12 '11 at 22:23
@GMan, I was writing non-sense, of course the compiler might just issue a call to the external definition, or just not take the hint. — ccom, Aug 12 '11 at 22:25
@Crashworks: It may look that way but unless you force it too the compiler will ignore you. Humans are **very** bad at this kind of optimization compilers are **very** good at this. Thus they normally ignore you and do what is best for the application. — Martin York, Aug 12 '11 at 22:26
@Martin : People often tell me this, but I often find that the C++ compiler generates less than ideal code. In fact, most of my job consists of improving performance in realtime code by finding places where the compiler did the wrong thing, and fixing them. — Crashworks, Aug 12 '11 at 23:28
@GMan A math library for SIMD operations on vectors, quaternions, and matrices, to start with. — Crashworks, Aug 12 '11 at 23:30
I have often seen what Crashworks has seen. Between a lack of runtime knowledge and perfect CPU architecture knowledge it is impossible for the compiler to make consistently optimal inlining decisions. It's difficult for Humans too, but strictly speaking, not impossible, especially when focusing on a small piece of critical code. — Crowley9, Aug 13 '11 at 04:02
@Crashworks: As a compiler writer. Then you must be using some pretty ancient compilers. The ones I have worked on find this task relatively simple task that is imposable to get wrong. — Martin York, Aug 13 '11 at 15:56
Since you disagree on the most fundamental issue (which is what my question was about), could you give (more detailed) examples/proof? — ccom, Aug 13 '11 at 17:15
@Martin: Are you saying that compilers can always inline, when they want to, or that they always make the right decision? LTO certainly allows the former these days, but the latter is not something any compiler technology can claim. — Crowley9, Aug 13 '11 at 18:37
@OliverCharlesworth inlining's benefit is not limited to the elimination of the subroutine call; inlining opens up a lot of other optimizations (code transformations) not possible without inlining. Often those other optimizations dominate the benefit. — rwong, Apr 04 '16 at 22:52
@LokiAstari We trust the compiler's inlining heuristics to be correct 99% of the time, but Crashworks would be talking about proverbial "the 1% of code that accounts for 99% of the execution time" example. In other words, there is a 1% of code where a programmer finds necessary to override the compiler's heuristics (after extensive benchmarking and examination of the disassembly), but the same programmer will be satisfied with entrusting the remaining 99% of the code to the compiler's inlining heuristics. — rwong, Apr 04 '16 at 22:55

Crowley9 · Answer 1 · 2011-08-13T03:35:41.410

Even with LTO, a compiler still has to use heuristics to determine whether or not to inline a function for every call (note it makes the decision not per function, but per call). The heuristic takes into account factors like - is it in a loop, is the loop unrolled, how big the function is, how frequently it is called globally, etc. The compiler will certainly never be able to accurately determine how frequently code is called, and whether or not the code expansion is likely to blow out the instruction/trace/loop/microcode caches of a particular CPU at compile time.

Profile Guided Optimization is supposed to be a step towards addressing this, but if you've ever tried it, you are likely to have noticed that you can get a swing in performance in the order of 0-2%, and it can be in either direction! :-) It's still a work in progress.

If performance is your ultimate goal, and you really know what you are doing, and really do a thorough analysis of your code, what one really needs is a way to tell the compiler to inline or not inline on a per-call basis, not a per-function hint. In practice I have managed this by using compiler specific "force_no_inline" type hints for cases I don't want inlining, and a separate "force_inline" copy (or macro in the rare case this fails) of the function for when I want it inlined. If anyone knows how to do this in a cleaner way with compiler specific hints (for any C/C++ compilers), please let me know.

To specifically address your points:

1.The code becomes less succinct and somewhat less maintainable.

Generally, no - it's just a keyword hint that controls how it is inlined. However if you jump through hoops like I described in the last paragraph, then yes.

2.Sometimes, inlining can greatly increase run-time performance.

When leaving the compiler to its own devices - yes, it certainly can, but mostly doesn't. The compiler has good heuristics that make good although not always optimal inlining decisions. Specificially for the keyword, compilers may totally ignore the keyword, or use to keyword as a weak hint - in general they do seem adverse to inlining code that red flags their heuristics (like inlining a 16k function into a loop unrolled 16x).

3.Inlining is decided at a fixed point in time, maybe without a terribly good foreknowledge of its uses, or without considering all (future) surrounding circumstances.

Yes, it uses static analysis. Dynamic analysis can come from your insight and you manually controlling inlining on a per-call basis, or theoretically from PGO (which still sucks).

Since you have an elaborate inline-system in place, how do you check if the compiler did inline a particular function call? — ccom, Aug 13 '11 at 10:02
I would also be very interested in how to give the compiler the hint to inline a single function call. — ccom, Aug 13 '11 at 10:03
Really oldschool: I look at the code generated by using objdump for gcc or .asm output for msvc to see what was actually generated. I also have some scripts the pipe the objdum -d output through grep "call" and wc, in order to get a total call count. Assuming the function you are trying to inline (or not inline) has less than or more than 1 call count you can get quick feedback on whether or not your code change made a difference. — Crowley9, Aug 13 '11 at 18:43
Cool. I wonder if can be done with gdb, or another debugger/profiler. — ccom, Aug 14 '11 at 15:07

Ciro Santilli OurBigBook.com · Answer 2 · 2021-02-10T19:51:33.250

GCC 9 Binutils 2.33 experiment to show that LTO can inline

For those that are curious if ld inlines across object files or not, here is a quick experiment that confirms that it can:

main.c

int notmain(void);

int main(void) {
    return notmain();
}

notmain.c

int notmain(void) {
    return 42;
}

Compile with LTO and disassemble:

gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o main.o main.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -c -o notmain.o notmain.c
gcc -O3 -flto -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out notmain.o main.o
gdb -batch -ex "disassemble/rs main" main.out

Disassembly output:

   0x0000000000001040 <+0>:     b8 2a 00 00 00  mov    $0x2a,%eax
   0x0000000000001045 <+5>:     c3      retq

so we see that there is no callq or other jumps, which means that the call was inlined across the two object files.

Without -flto however we see:

   0x0000000000001040 <+0>:     f3 0f 1e fa     endbr64 
   0x0000000000001044 <+4>:     e9 f7 00 00 00  jmpq   0x1140 <notmain>

so how there is a JMPQ, which means that the call was not inlined.

Note that the compiler chose JMPQ which does not make any stack changes as would be done by a more naive CALLQ as an optimization, I think this is a trivial minimal case of a tail call optimization.

So yes, if you are using -flto, you don't need to worry about putting definitions in headers so they can be inlined.

The main downside of having definitions in headers is that they may slow down compilation. For C++ templates, you may also be interested in explicit template instantiation: Explicit template instantiation - when is it used?

Tested in Ubuntu 19.10 amd64.

score 4 · Answer 3 · answered Aug 13 '11 at 00:23

4

The question is: does link-time optimization (e.g., in GCC) render manual inlining, e.g., declaring in C99 a function "inline" and providing an implementation, obsolete?

This article would seem to answer "Yes:"

Think for a minute: what turns a function into a good candidate for inlining? Apart from the size factor, the optimizer needs to know how often this function is called, where it is called from, how many other functions in the program are viable candidates for inlining and -- believe it or not -- whether the function is ever called. Optimizing (i.e. inlining) a function that isn't called even once is a waste of time and resources. But how can an optimizer know that a function is never called? Well, it cannot. Unless it has scanned the entire program. This is where [link-time optimization] becomes crucial.

answered Aug 13 '11 at 00:23

Gnawme

2,321
1
15
21

Of course, LTO is necessary for making "inline" obsolete, the question is if it is obsolete in real-life. – ccom Aug 13 '11 at 09:50
Unfortunately, the article is mostly speculative, and was written when LTO wasn't as common-place as it is today. – ccom Aug 13 '11 at 09:57
"The question is: does link-time optimization render manual inlining obsolete." There is no "real life" in the question. Manual inlining is effectively a compiler hint, anyway. LTO allows compiler and linker to make a much more informed choice about what to inline. – Gnawme Aug 13 '11 at 19:50
Also, the article explains how LTO (aka WPO) operates in "Visual C++ 7.0 and later versions, including the most recent Visual C++ 2005 beta 2." How is that speculative? If it's in a beta, it has been implemented. – Gnawme Aug 13 '11 at 19:52
I guess, I should put "real life" in the question. However, since a perfect compiler and optimizer renders the whole discussion and question void, it's already implicit. – ccom Aug 14 '11 at 15:15
The article is speculative insofar it mentions benefits without providing real data or comparisons. It's actually confusing that he tries to focus on Visual C++ when most of what writes is valid generally to LTO/PGO. It looks like he didn't study the compiling techniques he mentions, and did not run quantitative tests. He fails to mention that PGO does not always improve speed b/c many aspects of it are still up to research. Statements like "[...] my impression that there's still room for improvement" are just biased opinions, and some parts just read like an ad. – ccom Aug 14 '11 at 15:27

MSN · Answer 4 · 2011-08-13T20:37:29.337

1

If link time optimization were as fast as compile time optimization, then it would obviate the need for compiler hints. Unfortunately, it is generally not faster than compile time optimization, so it's a tradeoff between overall build speed and the overall quality of optimizations for that build.

Also, you still need to use inline when defining functions in headers. Otherwise, you will get linker errors for multiple definitions of those functions if they are used in multiple translation units.

edited Aug 13 '11 at 20:37

answered Aug 13 '11 at 04:21

MSN

53,214
7
75
105

That's a valid comment. But is "inline" really obsolete when LTO is assumed to be enabled? – ccom Aug 13 '11 at 09:51
"Otherwise, you will get linker errors for multiple definitions of those functions if they are used in multiple translation units" -- the `static` keyword is what is needed here. – Jason S Sep 07 '16 at 16:15

score 0 · Answer 5 · answered Aug 12 '11 at 21:56

0

I don't think the inline keyword affects maintainability, and only barely the succinctness. (opinion)
Sometimes inline can decrease run-time performance : http://www.parashift.com/c++-faq-lite/inline-functions.html#faq-9.3
The compilers are quite smart about inlining, I've heard that Visual Studio ignores them almost completely and decides inlining itself.

does link-time optimization render manual inlining, obsolete? Not at all, the optimizer that makes the inline keyword nigh-obsolete kicks in way before link-time.

answered Aug 12 '11 at 21:56

Mooing Duck

64,318
19
100
158

The optimizer that could make inline obsolete cannot kick in before link-time, b/c the definition may not be (and would in most interesting cases) in the same translation unit. – ccom Aug 12 '11 at 22:05
Regarding 2: That's why I put a "sometimes" in it and have "3". – ccom Aug 12 '11 at 22:08
`inline` doesn't necessarily tell the compiler that the function must be inlined; most modern compilers ignore that and use it only to specify that the function has internal linkage. – Billy ONeal Aug 12 '11 at 22:15
1

@Billy: `inline` doesn't give a function internal linkage, `static` does. That said, the *effect* is often the same, but implementation-wise it's not. – GManNickG Aug 12 '11 at 22:24
@Billy inline also means the compiler shouldn't throw errors if the function is defined multiple times, is that what you were thinking of? – Mooing Duck Aug 12 '11 at 22:41
@Mooing: That's what internal linkage is, yes. – Billy ONeal Aug 12 '11 at 22:42
4

@Billy: http://stackoverflow.com/questions/4957582/how-can-i-prove-that-inline-functions-default-to-internal-linkage `7.1.2/3 footnote says The inline keyword has no effect on the linkage of a function.` – Mooing Duck Aug 12 '11 at 22:45
2

@Billy: No, that's not what internal linkage is. – GManNickG Aug 12 '11 at 22:51

score -2 · Answer 6 · answered Aug 12 '11 at 21:49

-2

Item 33 - Scott Myers - 2nd Ed - Effective C++ springs to mind.

You must bear in mind the keyword static wrt inline! Now there is a hornets nest!

answered Aug 12 '11 at 21:49

Ed Heal

59,252
17
87
127

8

Would you care to share what that Item says, for those of us without that book? – Oliver Charlesworth Aug 12 '11 at 21:58
@Oli: Particularly when that item # is from an obsolete edition... :) – Billy ONeal Aug 12 '11 at 22:21

Link-time optimization and inline

6 Answers6

Linked