Explicit template function instantiation with inlining

Question

So a colleague and I have been debating the benefits of explicit template instantiation when it comes to reducing compile time, separating declaration from definition, and not affecting performance of a C++ math library I have written that is used for other projects.

Essentially I have a library of useful math functions designed to work with primitives like Vector3, Vector4, Quaternion, etc.. All of which are meant to be used with the template argument being float or double (and in some instances int).

So that I do not have to write these functions twice, once for floats once for double, the function implementations are templated, like so:

template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b) 
{ do something... }

All defined in .h files (so they are implicitly marked for inlining). Most of these function are short and are hoped to be inlined during usage compilation.

Headers are getting pretty big though, compile times are going up, and its getting hard to find the existence of functions by just glancing at the headers (that's one of the many reasons I like separating declaration from implementations).

So I can use explicit template instantiation in an accompanying .cpp file, like so:

  //in .h
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b) 
  { do something... }

  //in .cpp
  template Vector3<float> foo<float>(const Vector4<float>& a, 
                                     const Quaternion<float>& b);
  template Vector3<double> foo<double>(const Vector4<double>& a, 
                                       const Quaternion<double>& b);

This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?

An added benefit is that it does verify that the function compiles, even if i haven't used it yet.

Also I could do this:

  //in .h
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b);

  //in .cpp
  template<typename T>
  Vector3<T> foo(const Vector4<T>& a, 
                 const Quaternion<T>& b) 
  { do something... }

  template Vector3<float> foo<float>(const Vector4<float>& a, 
                                     const Quaternion<float>& b);
  template Vector3<double> foo<double>(const Vector4<double>& a, 
                                       const Quaternion<double>& b);

Same questions for that method:

This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?

I expect that the possibility of inlining would definitely be affected, considering the definition is not in the header.

It is nice that it manages to separate the declaration and definition for templated functions (for specific template arguments), without resorting to doing something like using a .inl included at the bottom of the .h file. This also hides the implementation from the user of the library which is beneficial (but not strictly necessary yet), while still being able to use templates so I don't have to implement a function N times.

Is there any way of allowing inlining by adjusting the method?

I have found it difficult just googling for an answer to these questions, and the standards specification is hard to comprehend on these subjects (for me at least).

BTW, this is expected to compile with VS2010, VS2012, and GCC 4.7.

Any assistance would be appreciated.

Thanks

I don't think there are many compilers that allow templates outside of header-files... — Mats Petersson, Oct 15 '14 at 22:32
@MatsPetersson: That's the most puzzling statement I've heard all day. — Kerrek SB, Oct 15 '14 at 22:54
@MatsPetersson That's not true, common compilers have no notion of a "header file". Once you `#include` the header file in the cpp file, it's as if the header file was part of the cpp file. What you probably mean is that the definition of the template method/class must be available in the current translation unit, if it is instantiated for a type that is not instantiated in any of the other translation units (the object files that this one will get linked to). But if you explicitly instantiate them in one for all the types you need (as the OP), you only need the declarations in header files. — Oguk, Oct 15 '14 at 22:56
Ok, I should perhaps have said, "I don't think there are many compilers that allow link-time resolution of templates". In other words, you need to have the source of the template in the same translation-unit that it is being used. Of course you can have templates in something that isn't a header-file, but only if you also USE it there. Or have I missed something? — Mats Petersson, Oct 15 '14 at 22:56
Actually, I think, most if not all of them do what you call "link-time resolution". That's why you get linker errors, if the code for the template instantiation is not in any of the object files you link, not compiler errors. In fact, I think, you are thinking about this one backwards: Usually, we have the one-definition rule. For templates, it has been relaxed (as, e.g., for extern inline functions) because you never know which instantiations are needed in other translation units and that would defeat the purpose of a template. But if you choose to, you can still obey the rule for templates. — Oguk, Oct 15 '14 at 23:03
@Ryan I cannot answer all of your questions, so I am not posting an answer. But for one I am sure: When you put the definition and explicit instantiation into the cpp file, it will reduce compile time if you have many different translation units that all use the template functions/classes, for this way they only need to be compiled in *one* translation unit (the one with the definition and explicit instantiations). — Oguk, Oct 15 '14 at 23:09

score 5 · Accepted Answer · edited Mar 02 '21 at 20:43

I'm assuming your technique intended to do the same as the answer to this question: Template instantiation effect on compile duration

To achieve the desired result, you would also need to prevent automatic instantiation by declaring the explicit instantiations in the header using extern. See Explicit instantiation declaration with extern

//in .h
template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b);

extern template Vector3<float> foo<float>(const Vector4<float>& a, 
                                          const Quaternion<float>& b);

extern template Vector3<double> foo<double>(const Vector4<double>& a, 
                                            const Quaternion<double>& b);

//in .cpp
template<typename T>
Vector3<T> foo(const Vector4<T>& a, 
               const Quaternion<T>& b) 
{ /* do something...*/ }

template Vector3<float> foo<float>(const Vector4<float>& a, 
                                   const Quaternion<float>& b);
template Vector3<double> foo<double>(const Vector4<double>& a, 
                                     const Quaternion<double>& b);

This should aid with compile times? Would this affect the possibility the possibility of the functions being inlined? Are the answers to either of those questions generally compiler specific?

The answer is highly dependent on the compiler - and should be more accurately determined empirically - but we can generalise about it.

We can assume that an increase in compile time comes not from the cost of parsing the additional template angle-bracket syntax, but from the cost of the (complex) process of template instantiation. If this is the case, the cost of using a given template specialization in multiple translation units should significantly increase compile times only if the instantiation is expensive and the compiler performs the instantiation more than once.

The C++ standard implicitly allows the compiler to perform the instantiation of each unique template specialisation only once across all translation units. That is, instantiation of template functions can be deferred and performed after the initial compilation, as described in the Comeau documentation. Whether this optimisation is implemented or not depends on the compiler, but is certainly not implemented in any version of MSVC prior to 2015.

If your compiler performs the instantiation at link time, this technique would prevent inlining if the compiler does not support cross module inlining. Newer versions of MSVC, GCC and Clang all support cross module inlining at link time with an additional linker option (LTCG or LTO). See Can the linker inline functions?

Explicit template function instantiation with inlining

1 Answers1