10

Is there a way to tell clang to unroll a specific loop?


Googling for an answer gives me command-line options which will affect the whole compilant and not a single loop.


There is a similar question for GCC --- Tell gcc to specifically unroll a loop --- but the answer provided there does not work with clang.

Option 1 suggested there:

#pragma GCC optimize ("unroll-loops")

seems to be silently ignored. In fact

#pragma GCC akjhdfkjahsdkjfhskdfhd

is also silently ignored.

Option 2:

__attribute__((optimize("unroll-loops")))

results in a warning:

warning: unknown attribute 'optimize' ignored [-Wattributes]

Update

joshuanapoli provides a nice solution how to iterate via template metaprogramming and C++11 without creating a loop. The construct will be resolved at compile-time resulting in a repeatedly inlined body. While it is not exactly an answer to the question, it essentially achieves the same thing.

That is why I am accepting the answer. However, if you happen to know how to use a standard C loop (for, while) and force an unroll it - please share the knowledge with us!

Community
  • 1
  • 1
CygnusX1
  • 20,968
  • 5
  • 65
  • 109
  • 1
    Typically, the compiler has a very good idea of when it's suitable to unroll a loop and when it's not a good idea. What is the special case you are trying to solve where this doesn't apply? – Mats Petersson Mar 07 '13 at 15:53
  • It may not *force* unrolling, but `__attribute__ ((hot))` might be worth trying. – Brett Hale Mar 07 '13 at 18:33
  • 1
    @MatsPetersson I want to explicitly measure the benefit of loop unrolling. Hand-written unroll actually speeds up the code 3 times, but the compiler does not figure it out. – CygnusX1 Mar 07 '13 at 21:39

4 Answers4

9

For a C++ program, you can unroll loops within the language. You won't need to figure out compiler-specific options. For example,

#include <cstddef>
#include <iostream>

template<std::size_t N, typename FunctionType, std::size_t I>
class repeat_t
{
public:
  repeat_t(FunctionType function) : function_(function) {}
  FunctionType operator()()
  {
    function_(I);
    return repeat_t<N,FunctionType,I+1>(function_)();
  }
private:
  FunctionType function_;
};

template<std::size_t N, typename FunctionType>
class repeat_t<N,FunctionType,N>
{
public:
  repeat_t(FunctionType function) : function_(function) {}
  FunctionType operator()() { return function_; }
private:
  FunctionType function_;
};

template<std::size_t N, typename FunctionType>
repeat_t<N,FunctionType,0> repeat(FunctionType function)
{
  return repeat_t<N,FunctionType,0>(function);
}

void loop_function(std::size_t index)
{
  std::cout << index << std::endl;
}

int main(int argc, char** argv)
{
  repeat<10>(loop_function)();
  return 0;
}

Example with complicated loop function

template<typename T, T V1>
struct sum_t
{
  sum_t(T v2) : v2_(v2) {}
  void operator()(std::size_t) { v2_ += V1; }
  T result() const { return v2_; }
private:
  T v2_;
};

int main(int argc, char* argv[])
{
  typedef sum_t<int,2> add_two;
  std::cout << repeat<4>(add_two(3))().result() << std::endl;
  return 0;
}
// output is 11 (3+2+2+2+2)

Using a closure instead of an explicit function object

int main(int argc, char* argv[])
{
  int accumulator{3};
  repeat<4>( [&](std::size_t)
  {
    accumulator += 2;
  })();
  std::cout << accumulator << std::endl;
}
joshuanapoli
  • 2,509
  • 3
  • 25
  • 34
  • Yes, this is my default way of doing it. But since I am already inside a template with parameters which need to get into the `loop_function` it gets really ugly... that is why I am looking for some more "eye-pleasing" solution :) – CygnusX1 Mar 07 '13 at 21:36
  • If you can use C++11, then you can use constexpr functions to cut down on the template syntax noise. – joshuanapoli Mar 08 '13 at 02:24
  • Not if only some parameters are constexpr/template and some are regular dynamic parameters... or? – CygnusX1 Mar 08 '13 at 06:55
  • You should be able to conceptually and syntactically separate the looping "algorithm" from the looped function. I added an example of a complicated loop function object with template and variable parameters. – joshuanapoli Mar 08 '13 at 17:10
  • What guarantees that all compilers will actually inline the function, instead of generating nested function calls calls? I think you should also use `always_inline` or `force_inline`, depending on which compiler you use. – Fabio Oct 15 '18 at 04:37
  • This is compile-time recursion, so there is no runtime nesting of function calls. If, in addition to unrolling the loop, you would also like to influence the inlining of the loop body, then maybe you'll want to use a compiler-specific feature. – joshuanapoli Oct 15 '18 at 11:57
  • Unfortunately this won't work if you do `return` / `break` / `continue` in the loop body. – Paranoid Jul 22 '20 at 12:20
3

Clang recently gained loop unrolling pragmas (such as #pragma unroll) which can be used to specify full/partial unrolling. See http://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll for more details.

Jingyue Wu
  • 161
  • 2
3

In C++17 and later, you can write a more straightforward (to me) version of joshuanapoli's template approach, using if constexpr:

template<std::size_t N, class F, std::size_t START = 0>
inline void repeat(const F &f) {
  if constexpr (N == 0) {
    return;
  } else {
    f(START);
    repeat<N - 1, F, START + 1>(f);
  }
}

This version does not need additional () at invocation time:

  int accumulator = 3;
  repeat<4>([&](std::size_t x) {
    accumulator += x;
  });
Tom 7
  • 507
  • 3
  • 10
2

As gross as it may be, you could isolate said for-loop into its own file, compiling it seperately (with its own command line flags).

relevant, but currently unanswered clang-developers question

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
EHuhtala
  • 587
  • 3
  • 8