1

I've been told that a duff device doesn't work with PHP, because the switch and case construct working different. I've found this duff devive on php.net, my question is what is wrong with this device? Or didn't I understand a duff device? In my assembler I can unroll a loop with a simple command and when it compiles I get an unrolled loop.

<?php
$n = $ITERATIONS % 8;
while ($n--) $val++;
$n = (int)($ITERATIONS / 8);
while ($n--) {
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
   $val++;
}
?>
Micromega
  • 12,486
  • 7
  • 35
  • 72
  • 2
    How is your question related to your mythical `switch` construct? I don't see any `switch` in your code... – Kerrek SB Nov 12 '11 at 14:16

2 Answers2

6

That is not a Duff's Device. It uses a special pre loop alignment step (which is precisely what Duff's Device is designed to avoid).

In a true Duff's Device there is a single section of unrolled code which is initially partially skipped over by a switch. This trick reduces the required amount of code (to just the loop) and reduces the number of conditional jumps in the code.

The code that you presented is simply a manually unrolled loop.


Loop unrolling:

Loop unrolling is an optimisation technique in which several iterations of a loop are processed at once. So instead of:

$number_of_iterations = 128;
for ($n = 0; $n !== $number_of_iterations; ++$n) {
    do_something();
}

You use:

$number_of_iterations = 128;
for ($n = 0; $n !== (int)($number_of_iterations / 4); ++$n) {
    //Repeat do_something() four times.
    //Four is the "unrolling factor".
    do_something();
    do_something();
    do_something();
    do_something();
}

The advantage of this is speed. Conditional branching is typically a relatively expensive operation. Compared to the unrolled loop, the first loop will pass over the conditional branch four times more often.

Unfortunately, this approach is somewhat problematic. Suppose $number_of_iterations was not divisible by four - the division of labour into larger chunks would no longer work. The traditional solution to this is to have another loop which performs the work in smaller chunks until the remaining amount of work can be performed by an unrolled loop:

$number_of_iterations = 130;
//Reduce the required number of iterations
//down to a value that is divisible by 4
while ($number_of_iterations % 4 !== 0) {
    do_something();
    --$number_of_iterations
}
//Now perform the rest of the iterations in an optimised (unrolled) loop.
for ($n = 0; $n !== (int)($number_of_iterations / 4); ++$n) {
    do_something();
    do_something();
    do_something();
    do_something();
}

This is better, but the initial loop is still needlessly inefficient. It again is branching at every iteration - an expensive proposition. In php, this is as good as you can get (??).

Now enter Duff's Device.

Duffs Device:

Instead of performing a tight loop before entering the efficient unrolled zone, another alternative is to go straight to the unrolled zone, but to initially jump to part way through the loop. This is called Duff's Device.

I will now switch the language to C, but the structure of the code will remain very similar:

//Note that number_of_iterations
//must be greater than 0 for the following code to work
int number_of_iterations = 130;
//Integer division truncates fractional parts
//counter will have the value which corresponds to the
//number of times that the body of the `do-while`
//will be entered.
int counter = (number_of_iterations + 3) / 4;
switch (number_of_iterations % 4) {
    case 0: do { do_something();
    case 3:      do_something();
    case 2:      do_something();
    case 1:      do_something();
            while (--counter > 0)
}

All of the conditional branches in the while ($number_of_iterations % 4 !== 0) from earlier have been replaced by a single computed jump (from the switch).


This whole analysis is predicated on the flawed notions that reducing the number of conditional branches in a region of code will always result in significantly better performance and that the compiler will not be able to perform these sorts of micro-optimisations by itself where appropriate. Both manual loop unrolling and Duff's Device should be avoided in modern code.

Mankarse
  • 39,818
  • 11
  • 97
  • 141
2

Your code is not actually a Duff's Device. A proper DD would have a while or do/while that is interlaced in a switch statement.

The point of a DD is to remove this bit of your code:

$n = $ITERATIONS % 8;
while ($n--) $val++;

The first step of the Duff Device is handled like a GOTO into the code:

send(to, from, count)
register short *to, *from;
register count;
{
        register n = (count + 7) / 8;
        switch(count % 8) {
        case 0:      do {     *to = *from++;
        case 7:              *to = *from++;
        case 6:              *to = *from++;
        case 5:              *to = *from++;
        case 4:              *to = *from++;
        case 3:              *to = *from++;
        case 2:              *to = *from++;
        case 1:              *to = *from++;
                } while(--n > 0);
        }
}

Say count % 8 turns out to be 5. That means the switch jumps to case 5, and then just falls through to the end of the while, at which point it starts doing the work in increments of 8.

Gustav Bertram
  • 14,591
  • 3
  • 40
  • 65
  • Can you define proper? I've read http://stackoverflow.com/questions/514118/how-does-duffs-device-work. Is duff device uses a switch? – Micromega Nov 12 '11 at 14:26
  • Updated answer. It's not a Duff Device without a switch statement, and you can only do that in C because PHP syntax rules do not allow it. – Gustav Bertram Nov 12 '11 at 14:38
  • But the result is the same my device loops a magnitude faster then a simple for-loop? I don't understand this difference in nameing? Is this patented? Also instead of interlace you can say nested it has the same meaning. – Micromega Nov 12 '11 at 14:58
  • It's not patented, it's just what it's called. Without the interlaced switch statement, it's just an **unrolled loop**. (See the other answer.) – Gustav Bertram Nov 12 '11 at 21:44
  • Really old answer; but for anyone reading - @Phpdna Interlaced doesn't mean nested whatsoever, look at the code more closely. – Fergus In London Oct 16 '14 at 13:00
  • @FergusMorrow: Concatenated? But you mean that while-loop in the switch? That wouldn't work in PHP. Maybe I didn't see it before?! You are right!? – Micromega Oct 16 '14 at 13:13
  • @Phpdna - No, it doesn't mean concatenated either; and we know it wouldn't work in PHP - that's the whole reason you asked the question. The loop and the switch *are* interlaced as any point of code in the loop can be executed in the switch; nesting would suggest that the loop was for one specific case of the switch. (*As an aside, I don't really understand your tone at all. I was merely posting something as this question is quite high on Google.*) – Fergus In London Oct 16 '14 at 14:07
  • @FergusMorrow:You mean the chat? Can you elaborate? – Micromega Oct 16 '14 at 14:30