13

I was reading the optimization options for GCC when I found the option -funroll-all-loops.

Its description reads:

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. '-funroll-all-loops' implies the same options as '-funroll-loops'

How can the compiler unroll a loop if its number of iterations is unknown at compile time? Doesn't the compiler need this information to unroll it? What corresponding C code does it generate, and in what contexts could this be useful if it usually makes the programs run more slowly?

PC Luddite
  • 5,883
  • 6
  • 23
  • 39
  • 5
    @Olaf how is this not a specific question? OP's clearly asking how gcc (a specific C compiler) is able to know how to unroll a loop if its bounds are unknown. – Ricky Mutschlechner Jul 01 '15 at 16:41
  • 4
    @Olaf How is it too broad if I want to know what a specific compiler option does? – PC Luddite Jul 01 '15 at 16:42
  • 1
    @paulsm4 I don't think so. If the compiler knows the bounds already, then the decision comes down to how much to actually unroll it. This question isn't asking that. – Ricky Mutschlechner Jul 01 '15 at 16:43
  • 1
    You may find the discussion about Duff's Device in [my answer](http://stackoverflow.com/a/17192335/1281433) to [Switch case weird scoping](http://stackoverflow.com/q/17192246/1281433) helpful. (Other posters are mentioning Duff's Device, too.) – Joshua Taylor Jul 01 '15 at 16:44

4 Answers4

4

Here's some C code showing how to do it:

int iterations = 100;
int unrollValue = 8;

while (iterations%unrollvalue)
{
   // insert loop code here
   iterations--;
}

while (iterations)
{
   // insert unrollValue copies of loop code here
   iterations-= unrollValue;
}

The compiler will replace the first loop with a relative jump, but that's not easy to represent in C. Note that unrolling by a power of 2 allows the compiler to use a mask instead of the (expensive) divide operation.

Steve Kolokowsky
  • 423
  • 2
  • 12
3

in what contexts could this be useful if it usually makes the programs run more slowly?

Well they are assuming if you choose this option you know what you are doing, if you don't you should not use this option.

what is gcc going to do, well I used this sample program:

#include <stdio.h>

void f(int j )
{
  for( int k = 0; k < j; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

and tested it with godbolt and it generates a jump table based on the number of iterations left (see it live):

cmpl    $1, %ebp
movl    $1, %ebx
je  .L1
testl   %r12d, %r12d
je  .L27
cmpl    $1, %r12d
je  .L28
cmpl    $2, %r12d
je  .L29
cmpl    $3, %r12d
je  .L30
cmpl    $4, %r12d
je  .L31
cmpl    $5, %r12d
je  .L32
cmpl    $6, %r12d
je  .L33
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
2

It can do something like:

while(n >= 8){
  foo(); foo(); foo(); foo(); foo(); foo(); foo(); foo(); 
  n -= 8;
}
while(n > 0){
  foo();
  n--;
}

Of course, Duff's Device would save having to write the second loop.

Why do it? That's up to the user. If foo() spends more than a few cycles, or if the original loop takes less that 5% of overall wall-clock time, or if n is typically small, it's probably not worth the trouble.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

You can't assume that there's such a thing as corresponding C code for a compiler's intermediate representation. But in this case, I'd expect the nearest equivalent to be something like Duff's Device, which is a sequence (usually in a loop) that can be entered at a computed location.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103