21

Consider this code:

#include <iostream>
typedef long xint;
template<int N>
struct foz {
    template<int i=0>
    static void foo(xint t) {
        for (int j=0; j<10; ++j) {
            foo<i+1> (t+j);
        }
    }
    template<>
    static void foo<N>(xint t) {
        std::cout << t;
    }

};

int main() {
    foz<8>::foo<0>(0);
}

When compiling in clang++ -O0, it compiles in seconds and then run for 4 seconds.

However, with clang++ -O2, compiling takes a long time and lots of memory. On Compiler Explorer, it can be seen that, with 8 changed to smaller value, it fully expands the loop.

I'm not making it fully no optimization, but to make it not recursive, just like what a nested loop should behave like. Is there anything I should do?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
l4m2
  • 1,157
  • 5
  • 17
  • 2
    Might be worth submitting a bug report, clearly some inlining heuristics should be tweaked. – Quimby Jan 02 '23 at 07:47

3 Answers3

13

Loop unrolling optimization can be disabled; see on Compiler Explorer . The produced code is non-recursive and expressed in terms of nested loops.

#pragma nounroll
for (int j=0; j<10; ++j) {
    foo<i+1> (t+j);
}

Also you can manually tune unrolling instead of disabling it. Unrolling by 8 generates similar code to the one that is looping 8 times. (Compiler Explorer)

#pragma unroll 8
for (int j=0; j<10; ++j) {
    foo<i+1> (t+j);
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Iwa
  • 512
  • 1
  • 5
  • 18
1

To make it non-recursive, you might use array as indexes:

static bool increase(std::array<int, N>& a)
{
    for (auto rit = std::rbegin(a); rit != std::rend(a); ++rit) {
        if (++*rit == 10) {
            *rit = 0;
        } else {
            return true;
        }
    }
    return false;
}

static void foo(xint t) {
    std::array<int, N> indexes{};

    do {
        std::cout << std::accumulate(std::begin(indexes), std::end(indexes), 0);
    } while (increase(indexes));
}

Demo

Jarod42
  • 203,559
  • 14
  • 181
  • 302
0

The simplest solution is to mark the problematic function using the noinline function attribute, which is also supported by several other C++ compilers (e.g. GNU g++):

    template<int i=0>
    static void foo(xint t)  __attribute__((__noinline__)) {

This instructs the compiler's optimizer to never inline calls to that function.

Dan Bonachea
  • 2,408
  • 5
  • 16
  • 31