Are the "inline" keyword and "inlining" optimization separate concepts?

Question

I am asking this basic question to make the records straight. Have referred this question and its currently accepted answer, which is not convincing. However the second most voted answer gives better insight, but not perfect either.

While reading below try to distinguish between the inline keyword and “inlining” concept.

Here is my take:

The "inlining" concept

This is done to save the call overhead of a function. It's more similar to macro-style code replacement. Nothing to be disputed.

The `inline` keyword

Perception A

The inline keyword is a request to the compiler usually used for smaller functions, so that compiler can optimize it and make faster calls. The Compiler is free to ignore it.

I partially dispute this for below reasons:

Larger and/or recursive functions are not inlined anyways and the compiler ignores the inline keyword completely
Smaller functions are automatically inlined by the optimizer irrespective of the inline keyword being mentioned or not.

It's quite clear that the user doesn't have any control over function inlining with the use of keyword inline.

Perception B

inline has nothing to do with the concept of inlining. Putting inline ahead of big / recursive functions won't help, while smaller function won't need it for being inlined.

The only deterministic use of inline is to maintain the One Definition Rule.

i.e. if a function is declared with inline then only below things are mandated:

Even if its body is found in multiple translation units (e.g. include that header in multiple .cpp files), the compiler will generate only 1 definition and avoid multiple symbol linker error. (Note: If the bodies of that function are different then it is undefined behavior.)
The body of the inline function has to be visible / accessible in all the translation units who use it. In other words, declaring an inline function in .h and defining in any one .cpp file will result in an “undefined symbol linker error” for other .cpp files

Verdict

IMO, the perception “A” is entirely wrong and the perception “B” is entirely right.

There are some quotes in standard on this, however I am expecting an answer which logically explains if this verdict correct or not.

Email reply from Bjarne Stroustrup:

"For decades, people have promised that the compiler/optimizer is or will soon be better than humans for inlining. This may be true in theory, but it still isn't in practice for good programmers, especially in an environment where whole-program optimization is not feasible. There are major gains to be had from judicious use of explicit inlining."

you can use `static inline` and put it in header. This will trick compiler to use this function in each translation unit separately. — Luka Rahne, Nov 20 '14 at 15:18
@LukaRahne: Or you can just use `inline` and put it in the header, without the weird situation of having multiple versions of the function (with different addresses and separate copies of static variables). — Mike Seymour, Nov 20 '14 at 15:20
@LukaRahne The `static` there is important to make those functions defined in different TUs actually completely different functions. Otherwise, they would still have to be identical. — Deduplicator, Nov 20 '14 at 15:22
_"Wrong"_ is a bit harsh, don't you think. the "A" perspective is the standard explanation given to C programmers learning C++. It's not wrong as such, it's just incomplete. It fails to mention how the rules change concerning multiple declarations and definitions, for example. Combine both perspectives, and you have a more complete picture — Elias Van Ootegem, Nov 20 '14 at 16:17
With some compilers, the `inline` keyword reminds the compiler to inline the function in *debug* mode or when optimizations are turned off. For higher levels of optimization, the compiler may inline small functions regardless of whether the `inline` keyword is used or not. Remember that the `inline` keyword can be used with *freestanding* functions also. I have this concept working on the IAR EWARM compiler. — Thomas Matthews, Nov 20 '14 at 18:39
There's a problem with the basic premise of this question, which is that the language definition/standard - and therefore any answer based on the language-lawyering game - has absolutely no direct connection to *what the compiler actually does*. e.g. "Smaller functions are automatically "inlined" by optimizer" -> what optimizer? What if I'm using TCC, or -O0? What if that feature isn't finished yet on my platform? What if I'm using a pessimizing compiler (think this was discussed before here)? The language doesn't define this stuff; perception A is entirely implementation-dependent. — Alex Celeste, Nov 20 '14 at 19:45
I think this is pretty good reading on inlining: http://www.drdobbs.com/inline-redux/184403879 — Fred Larson, Nov 20 '14 at 19:49
Perception A is only wrong in that `inline` is not guaranteed to have any effect *by the standard*, but compilers can guarantee any effect that they want. So it's not correct to say that `inline` has no effect on inlining, only that it's *not guaranteed* to have any effect -- but it still might! — cdhowie, Nov 21 '14 at 05:15
@MooingDuck: there is a question; it's just not obviously marked with a question mark. The very last sentence says he wants to know "if this verdict is true or false." It's a roundabout way of asking "I've drawn this conclusion; is it correct?" — Cornstalks, Nov 21 '14 at 07:28

japreiss · Accepted Answer · 2014-11-24T15:24:57.323

I wasn't sure about your claim:

Smaller functions are automatically "inlined" by optimizer irrespective of inline is mentioned or not... It's quite clear that the user doesn't have any control over function "inlining" with the use of keyword inline.

I've heard that compilers are free to ignore your inline request, but I didn't think they disregarded it completely.

I looked through the Github repository for Clang and LLVM to find out. (Thanks, open source software!) I found out that The inline keyword does make Clang/LLVM more likely to inline a function.

The Search

Searching for the word inline in the Clang repository leads to the token specifier kw_inline. It looks like Clang uses a clever macro-based system to build the lexer and other keyword-related functions, so there's noting direct like if (tokenString == "inline") return kw_inline to be found. But Here in ParseDecl.cpp, we see that kw_inline results in a call to DeclSpec::setFunctionSpecInline().

case tok::kw_inline:
  isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
  break;

Inside that function, we set a bit and emit a warning if it's a duplicate inline:

if (FS_inline_specified) {
  DiagID = diag::warn_duplicate_declspec;
  PrevSpec = "inline";
  return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;

Searching for FS_inline_specified elsewhere, we see it's a single bit in a bitfield, and it's used in a getter function, isInlineSpecified():

bool isInlineSpecified() const {
  return FS_inline_specified | FS_forceinline_specified;
}

Searching for call sites of isInlineSpecified(), we find the codegen, where we convert the C++ parse tree into LLVM intermediate representation:

if (!CGM.getCodeGenOpts().NoInline) {
  for (auto RI : FD->redecls())
    if (RI->isInlineSpecified()) {
      Fn->addFnAttr(llvm::Attribute::InlineHint);
      break;
    }
} else if (!FD->hasAttr<AlwaysInlineAttr>())
  Fn->addFnAttr(llvm::Attribute::NoInline);

Clang to LLVM

We are done with the C++ parsing stage. Now our inline specifier is converted to an attribute of the language-neutral LLVM Function object. We switch from Clang to the LLVM repository.

Searching for llvm::Attribute::InlineHint yields the method Inliner::getInlineThreshold(CallSite CS) (with a scary-looking braceless if block):

// Listen to the inlinehint attribute when it would increase the threshold
// and the caller does not need to minimize its size.
Function *Callee = CS.getCalledFunction();
bool InlineHint = Callee && !Callee->isDeclaration() &&
  Callee->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
                                       Attribute::InlineHint);
if (InlineHint && HintThreshold > thres
    && !Caller->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
                                             Attribute::MinSize))
  thres = HintThreshold;

So we already have a baseline inlining threshold from the optimization level and other factors, but if it's lower than the global HintThreshold, we bump it up. (HintThreshold is settable from the command line.)

getInlineThreshold() appears to have only one call site, a member of SimpleInliner:

InlineCost getInlineCost(CallSite CS) override {
  return ICA->getInlineCost(CS, getInlineThreshold(CS));
}

It calls a virtual method, also named getInlineCost, on its member pointer to an instance of InlineCostAnalysis.

Searching for ::getInlineCost() to find the versions that are class members, we find one that's a member of AlwaysInline - which is a non-standard but widely supported compiler feature - and another that's a member of InlineCostAnalysis. It uses its Threshold parameter here:

CallAnalyzer CA(Callee->getDataLayout(), *TTI, AT, *Callee, Threshold);
bool ShouldInline = CA.analyzeCall(CS);

CallAnalyzer::analyzeCall() is over 200 lines and does the real nitty gritty work of deciding if the function is inlineable. It weighs many factors, but as we read through the method we see that all its computations either manipulate the Threshold or the Cost. And at the end:

return Cost < Threshold;

But the return value named ShouldInline is really a misnomer. In fact the main purpose of analyzeCall() is to set the Cost and Threshold member variables on the CallAnalyzer object. The return value only indicates the case when some other factor has overridden the cost-vs-threshold analysis, as we see here:

// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())
  return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())
  return InlineCost::getAlways();

Otherwise, we return an object that stores the Cost and Threshold.

return llvm::InlineCost::get(CA.getCost(), CA.getThreshold());

So we're not returning a yes-or-no decision in most cases. The search continues! Where is this return value of getInlineCost() used?

The Real Decision

It's found in bool Inliner::shouldInline(CallSite CS). Another big function. It calls getInlineCost() right at the beginning.

It turns out that getInlineCost analyzes the intrinsic cost of inlining the function - its argument signature, code length, recursion, branching, linkage, etc. - and some aggregate information about every place the function is used. On the other hand, shouldInline() combines this information with more data about a specific place where the function is used.

Throughout the method there are calls to InlineCost::costDelta() - which will use the InlineCosts Threshold value as computed by analyzeCall(). Finally, we return a bool. The decision is made. In Inliner::runOnSCC():

if (!shouldInline(CS)) {
  emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
                               Twine(Callee->getName() +
                                     " will not be inlined into " +
                                     Caller->getName()));
  continue;
}

// Attempt to inline the function.
if (!InlineCallIfPossible(CS, InlineInfo, InlinedArrayAllocas,
                          InlineHistoryID, InsertLifetime, DL)) {
  emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
                               Twine(Callee->getName() +
                                     " will not be inlined into " +
                                     Caller->getName()));
  continue;
}
++NumInlined;

InlineCallIfPossible() does the inlining based on shouldInline()'s decision.

So the Threshold was affected by the inline keyword, and is used in the end to decide whether to inline.

Therefore, your Perception B is partly wrong because at least one major compiler changes its optimization behavior based on the inline keyword.

However, we can also see that inline is only a hint, and other factors may outweigh it.

Comment from Bjarne Stroustrup: *"For decades, people have promised that the compiler/optimizer is or will soon be better than humans for inlining. This may be true in theory, but it still isn't in practice for good programmers, especially in an environment where whole-program optimization is not feasible. There are major gains to be had from judicious use of explicit inlining."* — iammilind, Nov 24 '14 at 09:07
Yes. Always check the assembler output for performance-critical code. The compiler usually does the right thing, but not always. GCC and Clang have `__attribute__(always_inline)` and MSVC has `__forceinline` but even those can fail beccause [some functions are not inlineable](http://stackoverflow.com/a/5224270/805659). — japreiss, Nov 24 '14 at 15:41
Thanks for the bounty! It was fun to learn more about LLVM internals. — japreiss, Nov 30 '14 at 00:23
@iammilind: I'm not sure how a "static" compiler could ever expect to know better than humans how to optimize things unless code is annotated to indicate how often various things are going to happen. If a change would make a program 10% faster when processing some files and 50% slower with others, a compiler can't possibly be expected to know whether that change would be good or bad without knowing which kind of files the program will spend more time crunching. A dynamic compiler (JIT) might be able to use execution patterns to make such determinations on the fly, but... — supercat, May 28 '15 at 15:34
...a static compiler would have no such ability. Static compilers can often outperform dynamic ones in cases where they can be steered into optimizing for the proper cases, but that process generally requires knowledge of a future compilers can't possibly predict. — supercat, May 28 '15 at 15:36
Worth to read this one. A study of "whether or not compilers might change the decision to inline a function or not based on whether you write inline in the declaration," https://blog.tartanllama.xyz/inline-hints. GCC and Clang are examined. — MaxPlankton, Nov 12 '19 at 09:05

Mike Seymour · Answer 2 · 2014-11-20T17:13:02.637

22

Both are correct.

The use of inline might, or might not, influence the compiler's decision to inline any particular call to the function. So A is correct - it acts as a non-binding request that calls to the function be inlined, which the compiler is free to ignore.

The semantic effect of inline is to relax the restrictions of the One Definition Rule to allow identical definitions in multiple translation units, as described in B. For many compilers, this is necessary to allow the inlining of function calls - the definition must be available at that point, and compilers are only required to process one translation unit at a time.

edited Nov 20 '14 at 17:13

answered Nov 20 '14 at 15:18

Mike Seymour

249,747
28
448
644

Of course, there is LTO / whole-program-optimization which does not rely on having the definition in the same TU as the use. – Deduplicator Nov 20 '14 at 15:20
3

@Deduplicator: Indeed, that's why I qualified it with "for many compilers". I thought about adding a brief description of other schemes, but that's rather beyond the scope of a simple question. – Mike Seymour Nov 20 '14 at 15:21

Are the "inline" keyword and "inlining" optimization separate concepts?

The "inlining" concept

The `inline` keyword

Perception A

Perception B

Verdict

2 Answers2

The Search

Clang to LLVM

The Real Decision

Linked

Related

Are the "inline" keyword and "inlining" optimization separate concepts?

The "inlining" concept

The inline keyword

Perception A

Perception B

Verdict

2 Answers2

The Search

Clang to LLVM

The Real Decision

Linked

Related

The `inline` keyword