I'm not sure what you mean by "destination is never increased."
Manual loop unrolling is rather uncommon. Embedded microprocessors today are fast enough that such optimization is unnecessary (and would waste valuable program memory).
I use a variation of Duff's device in a linear solver kernel. There must be one back_step
for each fwd_step
, and they are performed in groups of four.
Note that the forward and backward-going loops are implemented by goto
s. When the if
in fwd_step
is skipped, execution jumps into the middle of the backward loop. So it's really a kind of double Duff's device.
This isn't any kind of "pragmatic" technique, it's just the best way I could find to express some very convoluted flow control.
switch ( entry ) {
#define fwd_step( index ) \
\
case (index): \
if ( -- count ) { \
...
startf:
fwd_step( 0 )
fwd_step( 1 )
fwd_step( 2 )
fwd_step( 3 )
stream = stream_back;
goto startf;
#define back_step( index ) \
.... \
} \
startb:
stream -= block_size;
back_step( 3 )
if ( ! -- countb ) break;
back_step( 2 )
if ( ! -- countb ) break;
back_step( 1 )
if ( ! -- countb ) break;
back_step( 0 )
if ( -- countb ) goto startb;
} // end switch