I have a hot and critical path function (about 45% of cycles:ppp
as per perf record
) in my C++17 application that is not being inlined as I would expect. It's a tiny function -- it simply returns the value of an atomic pointer member. The disassembly confirms that the function is just four assembly instructions, including the retq
. Furthermore, there is only a single caller of this function in the entire build. I've even declared this function as __attribute__((always_inline))
. Yet, there's a call and return to this function being generated.
The caller is in file A and the callee is in file B.
Some additional notes:
- I'm compiling with
-O3
and-march=native
- The callee is declared
const
and doesn't access any static members - I'm doing link-type optimization via
-flto
when linking - I'm using icc (ICC) 19.0.3.199 to compile
- The functions are simple member functions that are
const
and not templates, all with a dozen or fewer x86 assembly instructions
Actually, I've simplified a bit -- there are actually two places where this lack of inlining is happening in my application. File B has a function F1, which calls File A's F2, which calls File B's F3 (F2 and F3 are the ones listed above).
File A:
F2() {
F3();
}
File B:
F1() {
F2();
}
F3() {}
How can I get all of these to inline into one function? Another more fundamental question: can a function defined in a different file be inlined (perhaps using LTO)?