I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times)
before a for
loop, but the compiler ignores that statement.
It seems that the compiler doesn't unroll the loops automatically — I compared execution times for 1) a code with for
loop 2) the same code but with hand-unrolled loop. The hand-unrolled version was 3 times faster.
E.g.: I want to go from this:
for (int i=0; i<3; i++) {
do_stuff();
}
to this:
do_stuff();
do_stuff();
do_stuff();
Is there even something like loop unrolling in the Metal C++ language? If yes, how can I possibly let the compiler know I want to unroll a loop?