C++ Templates: Are template ínstantiations inlined? Are there drawbacks in performance?

Question

When some C++ entity, such as a structure, class, or function, is declared as a template, then the definitions provided for said entities are solely blue-prints which must be instantiated.

Due to the fact that a template entity must be defined when it is declared (which is commonly header files) I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?

The answer for this question raises my suspicion when I read the the paragraph:

"Templates can lead to slower compile-times and possibly larger executable, especially with older compilers."

Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted? Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?

In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?

Also, as these questions are also somewhat compiler dependent I am interested in the Visual C++ compiler. Generalised answers concerning what most compilers do give a good insight as well.

Thank you in advance.

One reason you can have more code is each template type is its own type. So all the code you have for `Foo` has to be repeated for `Foo` — NathanOliver, Nov 03 '15 at 20:56
@NathanOliver Well, as I've learned today latest linker implementations (g++ 5.x) are capable to merge back identical generated instantiation code. — πάντα ῥεῖ, Nov 03 '15 at 20:59
@πάνταῥεῖ can you provide a link? This sounds very interesting. — bolov, Nov 03 '15 at 21:01
Templates make it hard to take the decision whether to inline away from the compiler. Sometimes inlining is a bad idea and **sometimes** a good programmer is a better judge of that condition than a good compiler. So with templates, a programmer often lets the compiler inline something that the programmer would not have decided to inline. So larger (and because of cache issues, maybe slower) executable. Don't assume **you** are better at making inline choices than a compiler without good reason. But some of us are. — JSF, Nov 03 '15 at 21:02
@bolov Not yet. That was just a talk with an informed colleague a few hours ago. He's responsible for the toolchain management of our development environments. — πάντα ῥεῖ, Nov 03 '15 at 21:03
Often type choices could be represented as callback addresses or other run-time choices, allowing code sharing and a smaller executable and maybe (due to cache issues) faster execution. But when you choose to use templating to generalize across such choices, you get a bigger executable in which fewer total instructions need to be executed, so it is probably faster. — JSF, Nov 03 '15 at 21:06
Most "big" tasks on computers have data size far in excess of code size. Most computers have generous ram compared to realistic requirements. So paging code is almost totally ignore-able as a reason that bigger might directly cause slower. Cache is not usually a dramatic reason either. But it is more likely to be significant than paging. — JSF, Nov 03 '15 at 21:10
@πάνταῥεῖ Interesting. If you do wound up getting a link to some information on that I would love to read it. — NathanOliver, Nov 03 '15 at 21:10
@NathanOliver Couldn't find one myself so far. I'm going to ask him tomorrow, and come back here probably. — πάντα ῥεῖ, Nov 03 '15 at 21:12
Note that the "especially with older compilers" quote is from 2008. That would be "ancient compilers" by now - perhaps gcc 2.95 or Turbo-C++? There are other examples, [like this one](http://stackoverflow.com/questions/11638271/examples-of-when-a-bitwise-swap-is-a-bad-idea/11639305#11639305), where heavily inlined template code results in an extremely *small* code size. — Bo Persson, Nov 03 '15 at 21:59

score 3 · Accepted Answer · answered Nov 03 '15 at 21:16

Due to the fact that a template entity must be defined when it is declared (which is commonly header files)

Not true. You can declare and define template classes, methods and functions separately, just like you can other classes, methods and functions.

I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?

Some of it may be, or all of it, or none of it. The compiler will do what it deems is best.

Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted?

It could be interpreted a number of ways. In the same way that a bottle of Asprin contains the warning "may induce <insert side-effects here>".

Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?

You won't have several copies of the same entity - the compiler suite is obliged to see to that. Even if methods are inline, the address of the method will:

always exist, and
be the same address when referenced by multiple compilation units.

What you may find is that you start creating more types than you intended. For example std::vector<int> is a completely different type to std::vector<double>. foo<X>() is a different function to foo<Y>(). The number of types and function definitions in your program can grow quickly.

In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?

Excessive paging, probably not. Excessive cache misses, quite possibly. Often, optimising for smaller code is a good strategy for achieving good performance (under certain circumstances such as when there is little data being accessed and it's all in-cache).

May be worth mentioning that _inlining_ merely is a decision the compiler would take at the linking stage. — πάντα ῥεῖ, Nov 03 '15 at 21:24
I think you're confusing inlining (a compiler decision) and coalescing COMDATs (a linker responsibility) — Richard Hodges, Nov 03 '15 at 21:25

qeadz · Answer 2 · 2015-11-03T21:14:43.013

Whether they are inlined is always up to the compiler. Non-inlined template instantiations are shared for all similar instantiations.

So a lot of translation units all wanting to make a Foo<int> will share the Foo instantiation. Obviously if Foo<int> is a function, and the compiler decides in each case to inline it then the code is repeated. However the choice to inline is because the optimization of doing so seems highly likely to be superior to a function call.

Technically there could be a corner case where a template causes slower execution. I work on software which has super tight inner loops for which we do a lot of performance measurements. We use a number of template functions and classes and it has yet to show up a degradation compared to writing the code by hand.

I can't be certain but I think you'd have to have a situation where the template: - generated non-inlined code - did so for multiple instantiations - could have been one hand-written function - hand-written function incurs no run-time penalty for being one function (ie: no implicit conversions involving runtime checks)

Then it's possible that you have a case where the single-handwritten-function fits within the CPU's instruction cache but the multiple template instantiations do not.

When I tested ICC, GCC and MSVC several years ago because of severe performance problems caused by bad inlining choices, all of them had those problems, all of those problems were directly due to a top-down design of inlining. They varied in the specific situations causing performance disasters due to inlining, but all had them. Start by inlining a big template function that should not have been inlined. Inline something into that, then fail (due to a top down limit) to inline what really needed to be inlined below that. — JSF, Nov 03 '15 at 21:18
@JSF fair enough. I can't speak to your experiences. We work on a CPU-based renderer so any cache miss drops performance substantially. However we don't tend to make large functions templates, nor do we nest too many levels deep. Our issues have (strangely) been not that it inlined when it shouldnt but more that it _failed_ to inline. In this respect MSVC seems to give up earlier than the other compilers. — qeadz, Nov 03 '15 at 21:25

C++ Templates: Are template ínstantiations inlined? Are there drawbacks in performance?

2 Answers2