A pragmatic loop unrolling technique

Question

I'm finding a pragmatic loop unrolling technique example.

I think Duff's device is a one of nice tip.
But Duff's device's destination is never increased. It could be useful for embeded programmer who copies data to serial device, not general programmers.

Could you give me a nice and useful example?
If you have ever used it in your real code, it will be better.

The fact that Duff's use did not increase the destination address does not constrain your use of this code in a situation where you need to increment it. On the other hand your awareness that the next person to maintain your code may be an axe murdering psychopath who knows your address should. — dmckee --- ex-moderator kitten, Apr 03 '11 at 01:53
@dmckee: If I increase destination address in the case, it is same as memcpy. And I think memcpy is more readable and even faster in that case. — Benjamin, Apr 03 '11 at 01:56
It also does not prevent you from decrementing the destination or doing any other silly thing. Because Duff's Device is not about moving data it is a *general expression of loop unrolling* and you can use it anywhere you want to unroll a loop by hand. See Potatoswatter's answer, but before you use it read John's answer. — dmckee --- ex-moderator kitten, Apr 03 '11 at 02:00

score 4 · Answer 1 · answered Apr 03 '11 at 01:33

4

The most pragmatic technique would be to learn and love your compiler's optimization options, and occasionally inspect the generated assembly by hand if you encounter hotspots in profiling.

answered Apr 03 '11 at 01:33

John Zwinck

239,568
38
324
436

Potatoswatter · Answer 2 · 2011-04-03T02:00:55.123

I'm not sure what you mean by "destination is never increased."

Manual loop unrolling is rather uncommon. Embedded microprocessors today are fast enough that such optimization is unnecessary (and would waste valuable program memory).

I use a variation of Duff's device in a linear solver kernel. There must be one back_step for each fwd_step, and they are performed in groups of four.

Note that the forward and backward-going loops are implemented by gotos. When the if in fwd_step is skipped, execution jumps into the middle of the backward loop. So it's really a kind of double Duff's device.

This isn't any kind of "pragmatic" technique, it's just the best way I could find to express some very convoluted flow control.

switch ( entry ) {

#define fwd_step( index )                                                                                         \
                                                                                                     \
case (index):                                                                                                    \
    if ( -- count ) {                                                                                               \
        ...

startf:
    fwd_step( 0 )
    fwd_step( 1 )
    fwd_step( 2 )
    fwd_step( 3 )

        stream = stream_back;
        goto startf;


#define back_step( index )                                                            \
        .... \
    }                                                                                    \

startb:
    stream -= block_size;

    back_step( 3 )
    if ( ! -- countb ) break;
    back_step( 2 )
    if ( ! -- countb ) break;
    back_step( 1 )
    if ( ! -- countb ) break;
    back_step( 0 )
        if ( -- countb ) goto startb;
} // end switch

The "destination is never increased" means that Duff used his code to write data to a memory mapped I/O port. He also did it only after measuring that no other code was fast enough on his hardware. — Bo Persson, Apr 03 '11 at 09:33

score 0 · Answer 3 · edited May 23 '17 at 11:55

0

(For the benefit of others, background on Duff's device can be found here and here)

I've encountered it in image processing optimizations, especially to handle border conditions where fewer pixels than a complete tile or kernel are to be copied (and this can avoid the test at each coordinate.)

edited May 23 '17 at 11:55

Community

1
1

answered Apr 03 '11 at 01:42

holtavolt

4,378
1
26
40

A pragmatic loop unrolling technique

3 Answers3