6

If a function is only used in one place and some profiling shows that it's not being inlined, will there always be a performance advantage in forcing the compiler to inline it?

Obviously "profile and see" (and in the case of the function in question, it did prove to be a small perf boost). I'm mostly asking out of curiosity -- are there any performance disadvantages to this with a reasonably smart compiler?

Robert Fraser
  • 10,649
  • 8
  • 69
  • 93
  • It depends on how often it will be called. If it's always called by the caller, then inlining is more likely help. If the function is massive and is trap function that is rarely ever called, then inlining might not be such a good idea. – Mysticial Mar 20 '15 at 20:08

7 Answers7

12

No, there are notable exceptions. Take this code for example:

void do_something_often(void) {
    x++;
    if (x == 100000000) {
        do_a_lot_of_work();
    }
}

Let's say do_something_often() is called very often and from many places. do_a_lot_of_work() is called very rarely (one out of every one hundred million calls). Inlining do_a_lot_of_work() into do_something_often() doesn't gain you anything. Since do_something_often() does almost nothing, it would be much better if it got inlined into the functions that call it, and in the rare case that they need to call do_a_lot_of_work(), they call it out of line. In that way, they are saving a function call almost every time, and saving code bloat at every call site.

Variable Length Coder
  • 7,958
  • 2
  • 25
  • 29
3

One legitimate case where it makes sense not to inline a function, even if it's only called from a single location, is if the call to the function is rare and almost always skipped. Keeping the instructions before the function call and the instructions after the function call closely together in memory may allow those instructions to be kept in the processor cache, when that would be impossible if those blocks of instructions were separated in memory.

It would still be possible for the compiler to compile the function call as if using goto, avoiding having to keep track of a return address, but if the compiler has already determined that the function call is rare, then it makes sense to not pay as much time optimising that call.

1

You can't "force" the compiler to inline it, unless you are considering some implementation-specific tools that you have not mentioned, so the question is entirely moot.

If your compiler is already not doing so then it has a reason.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
0

If the function is called only once, there should be no performance disadvantages in inlining it. However, that does not mean you should blindly inline all functions. For example, if the code in question is Linux kernel code and you're using the BUG_ON or WARN_ON statement to print a stack trace, you don't get the full stack trace which includes the inline function. Instead, the stack trace contains only the name of the calling function.

And, as the other answer explained, the "inline" doesn't actually force the compiler to inline the function, it just is a hint to the compiler. However, there is actually an attribute __attribute__((always_inline)) in GCC which should force the compiler to inline the function.

juhist
  • 4,210
  • 16
  • 33
  • [_"Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified."_](https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Function-Attributes.html#Function-Attributes) That's not quite what you're claiming. – Lightness Races in Orbit Mar 20 '15 at 20:21
  • `If the function is called only once, there should be no performance disadvantages in inlining it` That's not entirely true, you have to consider that it might potentially be wasting cache memory. – sbabbi Mar 20 '15 at 20:23
  • I don't understand where's the difference in what I'm claimimg and what the manual is claiming. The manual is claiming that "inline" functions are not inlined unless -O is used, and I'm claiming that "inline" is just a hint to the compiler. Furthermore, the manual is claiming that the attribute inlines the function even if -O is not used, and I'm claiming that it forces inlining. Seem to be compatible statements to me. – juhist Mar 20 '15 at 20:27
  • @sbabbi And how it would exactly waste cache memory? If the function is called only once, there is only one location in the code where the code of the function is inserted so no duplicate code in the cache. Of course, if it's a "slowpath" function that is rarely called then the cache alignment might be not optimal in all cases, but you can fix that with __builtin_expect or by using an unlikely() macro. – juhist Mar 20 '15 at 20:29
  • @juhist see the other answers. – sbabbi Mar 20 '15 at 20:41
0

Make sure that the function definition is not exported. If it is, it obviously needs to be compiled, and that means that if your function is big probably the call will not be inlined. (Remember, it's the call that gets inlined, not the function. A function might get inlined in one place and called in another, etc.)

So even if you know that the function is called only from one place, the compiler might not. Make sure to hide the definition of your function to the other object files, for example by defining it in the anonymous namespace.

That being said, even if it is called from only one place, it does not mean that it is always a good idea to inline it. If your function is called rarely, it might waste a lot of memory in the CPU cache.

sbabbi
  • 11,070
  • 2
  • 29
  • 57
0

Depending on how you wrote your function.

In some cases, yes!

void doSomething(int *src,                 int *dst, 
                 const int loopCountInner, const int loopCountOuter)
{
     int i, j;
     for(i=0; i<loopCounterOuter; i++){
         for(j=0; j<loopCounterInner; j++){
             *dst = someCalculations(*src);
             src++; 
             dst++
         }
     }
}

In this example, if this function is compiled as non-inlined, then compiler basically has no knowledge about the trip count of the two loops. This is a big deal for implementations that rely strongly on compile-time optimizations.

I came across a even worse case: compiler assumes loopCounterInner to be a large value and optimized for that case, but loopCounterInner is actually 3 or 5 so the best choice is to fully unroll the inner loop!

For C++ probably the best way to do it is to make them template variables, but for C, the only way to generate differently optimized code for different use cases is to inline the function.

user3528438
  • 2,737
  • 2
  • 23
  • 42
0

No, if the code is a rarely used function then keeping it off the 'hot path' will be beneficial. An inline function will use up cache space [instruction cache] whether or not the code is actually used. Tools like LTCG combined with Profile Guided optimisation (in the MSFT world, not sure about Linux) go to great pains to keep rarely used code off the hot path and this can make a significant difference

Mike Vine
  • 9,468
  • 25
  • 44