3

I read the document about the loop unrolling. It explains that if you set unrolling factor as 1, then the program will work like with #pragma nounrolling.

However, that documents does not include #pragma unroll(0) case.. Since the range of n is 0 to 255, I'm just wondering out of curiosity there is any difference between #pragma unroll(0) and #pragma unroll(1) cases.

I'm using C with icc compiler.

  • I suspect it would be equivalent, though you're right, they don't mention it. However [they do mention](https://software.intel.com/en-us/node/524556): `If n is omitted or if it is outside the allowed range, the optimizer assigns the number of times to unroll the loop.` I'd imagine it's outside of that range. – Jeff Mercado May 18 '18 at 03:39
  • Thank you very much! To figure out the differences, I've tested for 10^6 times of adding array elements. However, I can not find any performance difference not only between unroll(0) and unroll(1) but also between unroll, unroll(0), unroll(1), ... , unroll(8). Could you suggest more suitable experiments to capture the different features? – rae hyun kim May 18 '18 at 04:11
  • I believe these directives are only active with the additional "O3" optimization in the compile args – static_cast May 18 '18 at 17:26
  • Yes, I've tested with -O3 compiler option.. – rae hyun kim May 19 '18 at 00:58

1 Answers1

1

From the Intel documentation:

The compiler generates correct code by comparing n and the loop count.

Based on that, I would make an assumption there is no difference between #pragma unroll(0) and #pragma unroll(1) as the the code generated would be equivalent.

static_cast
  • 1,174
  • 1
  • 15
  • 21