11
void Send(int * to, const int* from, const int count)
{
    int n = (count+7) / 8;
    switch(count%8)
    {
       case 0: do { *to++ = *from++;
       case 7:      *to++ = *from++;
       case 6:      *to++ = *from++;
       case 5:      *to++ = *from++;
       case 4:      *to++ = *from++;
       case 3:      *to++ = *from++;
       case 2:      *to++ = *from++;
       case 1:      *to++ = *from++;
        } while (--n>0);
    }
}
starblue
  • 55,348
  • 14
  • 97
  • 151
MainID
  • 29,070
  • 19
  • 57
  • 70
  • 4
    As others said, Duff's Device. I wouldn't implement this if I didn't have to, too esoteric/obfuscated. I prefer readable code ;-) Although, if wrapped in a function like this, with good comments/documentation, I'd use it if I didn't have to touch it. – Erik van Brakel Nov 12 '09 at 16:02
  • I wouldn't have much problem with using Duff's Device where appropriate. It is talked about in great depth and the biggest comment needed would be a URL where it is fully talked about. – Jon Nov 12 '09 at 16:10
  • 3
    Wow - someone actually bothered using Duff's Device for optimization yet didn't turn the modulo and division into a shift??? – Aaron Nov 12 '09 at 16:12
  • Is the do {} while() syntactically incorrect? – Martin York Nov 12 '09 at 17:23
  • @Martin, yes, it is. See wikipedia for details. – Nathan Ernst Aug 02 '10 at 08:01
  • If you are going to dwell into low level optimizations, there's probably more to gain by using vectorized instructions in processors that support them than this. – David Rodríguez - dribeas Aug 03 '10 at 09:26
  • 3
    @Aaron: **Any** decent compiler will optimize the modulo and division to bit twiddling if the r.h.s. is a power of two known at compile time. Manual loop unrolling, on the other hand, can still be an occasionally useful optimization. – dsimcha Aug 23 '10 at 02:03

4 Answers4

18

This is Duff's Device. It's a method of unrolling loops that avoids having to add a secondary fix-up loop to deal with times when the number of loop iteration isn't know to be an exact multiple of the unrolling factor.

Since most answers here seem to be generally positive about it I'm going to speak out against it.

With this code a compiler is going to struggle to apply any optimization to the loop body. If you just wrote the code as a simple loop a modern compiler should be able to handle the unrolling for you. This way you maintain readability and performance and have some hope of other optimizations being applied to the loop body.

The Wikipedia article referenced by others even says when this 'pattern' was removed from the Xfree86 source code performance actually improved.

This outcome is typical of blindly hand optimizing any code you happen to think might need it. It prevents the compiler from doing its job properly, makes your code less readable and more prone to bugs and typically slows it down. If you were doing things the right way in the first place, i.e. writing simple code, then profiling for bottlenecks, then optimizing, you'd never even think to use something like this. Not with a modern CPU and compiler anyway.

It's fine to understand it, but I'd be surprised if you ever actually use it.

Pete Fordham
  • 2,278
  • 16
  • 25
7

This mingling of a switch statement and a while loop is called "Duff's Device". It is a way to unroll loops, which was an optimization often used in earlier times.

So this code still copies the memory contents from one place to the other, but it might be more efficient. Beware, on today's architectures you should always measure that, because with cache locality and blindingly fast CPUs loop unrolling is often a bad idea.

Steffen
  • 2,888
  • 19
  • 19
5

This is functionally identical to the code below:

for(int i=0;i<n;i++)
{
  *to++=*from++;
}

The difference is that your code unrolls the loop so that only 1 loop iteration is required for each 8 integers copied. Since there are no breaks for any of the cases, execution falls through from each case label to the next.

When count%8==0, 8 copies are executed inside of the loop for the first iteration

when count%8==7, 7 copies are executed for the first iteration

and so forth. After the first iteration with %8 copies, exactly 8 copies happen per iteration.

By unrolling the loop in this manner, the loop overhead is significantly reduced. It's important to note the order of the case values (0,7,6,5,4,3,2,1) which lend themselves to being translated into a jump table by the compiler.

update

An issue with the example code posted by OP is that a count value of 0 will cause 8 copies to take place, potentially resulting in a buffer overflow.

3Dave
  • 28,657
  • 18
  • 88
  • 151
  • 2
    Your explanation seems somewhat unclear. If I'm not mistaken, the switch is only for determining how many unrolled assignments to perform during the first iteration (to compensate that the buffer length might not be evenly divisible by 8). – UncleBens Nov 12 '09 at 16:32
4

Duff's device

In computer science, Duff's device is an optimized implementation of a serial copy that uses a technique widely applied in assembly language for loop unwinding. Its discovery is credited to Tom Duff in November of 1983, who at the time was working for Lucasfilm. It is perhaps the most dramatic use of case label fall-through in the C programming language to date. Duff does not claim credit for discovering the concept of loop unrolling, just this particular expression of it in C.

ephemient
  • 198,619
  • 38
  • 280
  • 391