4

Good day.

I'd like to ask for contribution on tricks to handle unrolled loops leftovers, with a caveat that loops are fairly small, 1-3 times the unrolling factor, eg:

EG. given unrolling factor of B

int i = 0;
for (; i < N-N%B; i += B) {
    ...
}
// remainder
for (; i < N; ++i) {
    ...
}

if B is 2, I can do the following:

// remainder
if (N%2) {
    ....
}

But what is a good way to handle B>2

Anycorn
  • 50,217
  • 42
  • 167
  • 261
  • Hard to see why a trick is needed, you loop N / B times, remainder is N % B. You'd normally leave this up to the code generator, it already knows how to unroll loops. – Hans Passant Mar 15 '12 at 20:11
  • Well there's always *[Duff's device](http://en.wikipedia.org/wiki/Duff's_device)* but there's nothing wrong with your code. – Mike Dunlavey Mar 15 '12 at 20:57
  • In C or any language? Predicated instructions can be nice for this in some assembly languages. – harold Mar 15 '12 at 22:40
  • @Mike I was thinking something along those lines. The loops are short so that remainder has significant ovrhead. – Anycorn Mar 15 '12 at 23:56
  • @harold C is preferred but may be yu can give some examples with assembly? – Anycorn Mar 15 '12 at 23:57
  • @Anycorn: Do you know if a good percent of time is spent in that exact code? If there's a function call inside the loop, or anything that takes more than several cycles, it's probably not worth the bother. You could profile, or as I do, just pause it a few times. If you catch it a few times in the increment/compare/jump instructions of the loop, then it's worth unrolling, otherwise not. – Mike Dunlavey Mar 16 '12 at 01:46
  • @Anycom: Besides, you don't have to do remainder. Like if B is 8, just loop `for(; i < N - 7; i += 8)`. – Mike Dunlavey Mar 16 '12 at 01:51
  • @Mike the loop overhead is about 5% - not a whole lot but I need to get the most performance. No functions calls in the body, purely float ops. Unrolling the entire loop helps but I want to avoid code size bloat. – Anycorn Mar 16 '12 at 05:08

1 Answers1

1

Your if (N%2) can be easily extended to any unroll factor:

for (; i < N-B+1; i += B) {
    x1; x2; ... xB;
}
if (i < N) {
    x1;
if (i < N-1) {
    x2;
    ...
if (i < N-B+2) {
    xB;
}}}

For small unroll factors this may be more efficient than second loop or Duff's device.

This version looks better. gcc 4.6 compiles almost the same code out of it:

if (i++ < N) {
    x1;
if (i++ < N) {
    x2;
    ...
if (i++ < N) {
    xB;
}}}

And this version may be more optimal if B is a power of two. At least gcc compiles better code for it. Also it is definitely the best if N is a constant. But if neither N is a constant, nor B is a power of two, advantage of this method is not so obvious because of less efficient remainder computation (which means usually several instructions, including multiplication):

if (N%B > B-2) {
    x1;
if (N%B > B-3) {
    x2;
    ...
if (N%B > 0) {
    xB;
}}}
Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98