0

I have been trying to understand the practical usages of TMP. I see a lot of code along the following lines:

#ifndef LOOP2_HPP
#define LOOP2_HPP

// primary template
template <int DIM, typename T>
class DotProduct {
  public:
    static T result (T* a, T* b) {
        return *a * *b  +  DotProduct<DIM-1,T>::result(a+1,b+1);
    }
};

// partial specialization as end criteria
template <typename T>
class DotProduct<1,T> {
  public:
    static T result (T* a, T* b) {
        return *a * *b;
    }
};

// convenience function
template <int DIM, typename T>
inline T dot_product (T* a, T* b)
{
    return DotProduct<DIM,T>::result(a,b);
}

Is it a good practice to always explicitly inline such heavily recursive functions?

EDIT:

For a more concrete example take the following code:

template <int N>
inline void f() {
    f<N-1>();
    std::cout << N << "\n";
}

template <>
void f<0>() {
    std::cout << 0 << "\n";
};

int main() {
      f<1>();
    return 0;
}

I just want to use the function f as a way to unroll a bunch of cout statements which I don't want to write at compile time. Following is the assembly generated by gcc-8.3, all optimizations enabled:

        void f<0>():
            push    rbp
            mov     rbp, rsp
            mov     esi, 0
            mov     edi, OFFSET FLAT:_ZSt4cout
            call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
            mov     esi, OFFSET FLAT:.LC0
            mov     rdi, rax
            call    std::basic_ostream<char, std::char_traits<char> >& s

td::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
        nop
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        call    void f<1>()
        mov     eax, 0
        pop     rbp
        ret
void f<1>():
        push    rbp
        mov     rbp, rsp
        call    void f<0>()
        mov     esi, 1
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
        mov     esi, OFFSET FLAT:.LC0
        mov     rdi, rax
        call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
        nop
        pop     rbp
        ret

It seems that each of the unrolling leads to a runtime call instruction. It is this cost I want to avoid. I just want the final generated code to be a concatenation of multiple couts.

Abhishek Kumar
  • 729
  • 6
  • 20
  • 1
    1) How template recursive functions, should be different from non-template recursive functions, in runtime cost? There are no templates in the compiled code. Templates are only compile-time concept. 2) Did you look at/investigate generated assembly? How was it different from your expectations? – Algirdas Preidžius Jul 02 '19 at 14:18
  • The question in the title is slightly different from the one in the question. – 463035818_is_not_an_ai Jul 02 '19 at 14:18
  • @AlgirdasPreidžius The thing is that it's not truely recursion. A different function is called in each "recursion" level. The depth is known at compile time. – Guillaume Racicot Jul 02 '19 at 14:19
  • @GuillaumeRacicot maybe it is more correct to say that the recursion happens at compile time when instantiating the template, while at runtime it is just a bunch of functions calling each other. To OPs defense, the `recursion` tag was added later, and imho it is a defect that the tag description concentrates on recursion of function calls and makes little mention of other types of recursion – 463035818_is_not_an_ai Jul 02 '19 at 14:28
  • @formerlyknownas_463035818 There are cases where you need to mark template function as `inline`, such as out of line member function template or explicit specialization. Sure this isn't the case right now, but it's not true that you never need `inline` on template functions. – Guillaume Racicot Jul 02 '19 at 14:33
  • @GuillaumeRacicot hm thanks for correcting me. Cannot edit the comment anymore so I will delete it. Have to do some research, cant remember right now that I had to put an `inline` in front of a template – 463035818_is_not_an_ai Jul 02 '19 at 14:35
  • maybe this is a duplicate, at least it is closely related: https://stackoverflow.com/questions/10535667/does-it-make-any-sense-to-use-inline-keyword-with-templates – 463035818_is_not_an_ai Jul 02 '19 at 14:36
  • @formerlyknownas_463035818 searching I cannot found the case I had, but I know it was using msvc. That would make a great question. – Guillaume Racicot Jul 02 '19 at 14:36
  • @GuillaumeRacicot the top answer to the question I linked has an example that causes error without `inline`. Something about explicit specialization.. – 463035818_is_not_an_ai Jul 02 '19 at 14:38
  • Thanks, everybody for your comments. I have tried to give a more concrete example with the generated assembly. – Abhishek Kumar Jul 02 '19 at 14:41
  • @AbhishekKumar most of the noise is due to calls to iostream. Compiling without printing should be much more inlinable. – Guillaume Racicot Jul 02 '19 at 15:01

1 Answers1

1

Is it a good practice to always explicitly inline such heavily recursive functions?

Is there still a need for templating such functions with MTP anymore as we have constexpr for many years now? And even if constexpris not enough, we will have consteval in c++20 ( hopefully ).

Inlining only gives the compiler the chance to optimize the code away, but it is not a guarantee. Make it a recursive template function gives the compiler the chance to waste your memory with non inlined recursive template instances which is the opposite of what you want to achieve! If you compile with -O0 you see a lot of code generated from your example.

You can force the compiler to generate the result in compile time, as long you can use the resulting value as template parameter for example.

But as always regarding optimizations: 1) Try to get the best algorithm 2) Try to implement to make the code maintainable 3) Measure 4) Measure 5) Measure

And only if your code did not fulfill your speed requirements, start optimizing by hand.

In fact, your code has the chance to waste a lot of memory and also has the chance to optimize well. But you should move to constexpr functions instead of using more or less unreadable MTP code. So "inline" is only a very small part of the problem.

Your compiler is better as you believe! Normally! If you don't trust: Measure! And only if you see a real problem: handcraft optimization.

If using constexpr, especially with recursive functions, most compilers provide command line flags to give a level for deeper compile time evaluation if you do NOT force the compiler to get the result in compile time with using the result as template parameter or any other "must be a compile time constant" like size for an array. It depends on the compiler you use, so please read the manual!

If you use std::cout inside a recursion/loop, you will never see a "single output optimization". But at all: if you have enough time for using std::cout you don't have to think of the few lines of assembly around it. std::cout is typically slow related to the code generating the data which should be written to console at all!

Don't optimize the wrong things!

Add on: If you really want to generate a compile time string from a list of integers, you can take that as a base for your example: C++ convert integer to string at compile time

Klaus
  • 24,205
  • 7
  • 58
  • 113
  • I am trying to understand what is a good way to unroll things at compile time, kind of like C macros. `cout` was just an example of a non-pure function which I want to call multiple times. Instead of writing ugly repeated couts, maybe I could make compile do it for me. – Abhishek Kumar Jul 02 '19 at 15:11
  • @AbhishekKumar: As said: MTP is one thing, (recursive) constexpr functions is the other way to come to the same result. But important is to really see what the compiler will generate and learn how to force the compiler to inline and calc the results at compile time. – Klaus Jul 02 '19 at 15:17
  • Thank you for your answer. Following article seems to suggest unrolling loops using MTP - http://www.informit.com/articles/article.aspx?p=30667&seqNum=7 . I tried the exact code shown there and it seems to still generate multiple `call` instructions. Doesn't it defeat the entire point of loop unrolling if you are incurring function calls? – Abhishek Kumar Jul 02 '19 at 15:36
  • @AbhishekKumar: It is the expected result: If you have a loop, you see only one call and the code for the loop around. If the compiler unrolls, you get a sequence of calls instead a loop. So what is unexpected here? You will be able to calculate the complete string of your output from your example in compile time, based on the link I present. But hey, why? You have learned how MTP loops work and thats it here. ;) BTW: If you want to "thank" me, give votes and if the answer seems to be correct, accept it ;) – Klaus Jul 02 '19 at 15:42
  • The call I am trying to avoid is `call void f<0>()` (done from f<1>). Both f<0> and f<1> again call `std::basic_stream` (This would have been the only function call in loop unrolling). AFAIU the MTP version is different from the compiler loop unrolling in the way that, it does one loop's work, then calls a function, which again does one loop's work and calls another function instead of just concatenating various loop's bodies together. – Abhishek Kumar Jul 02 '19 at 15:48
  • @AbhishekKumar: you are right! What you see is the code which can't be optimized away as the cout call can't be "removed". As this, you get something like a unrolled recursive function call stack. But that is exactly what I said in my answer: "In fact, your code has the chance to waste a lot of memory". Because this will happen always if the compiler can't remove the recursive call. You only should use MTP loop unrolling if the recursive function can be calculated at compile time, only the result is used and all is optimized to a single value. As this, constexpr is your friend. – Klaus Jul 02 '19 at 16:18