How can Duff's device code be compiled?

Question

I understood why Duff's device is faster than normal loop code which can be unrolled but is not optimized. But I can't understand how the code can be compiled yet.
I guess it's a trick about the switch syntax. But not anymore.

How can do while sentence exist in switch sentence? Very weird.
Is there anyone who can explain this?

Edit: Another question. Why did duff use 8? It could be 16, 65536 or whatever. Because of code size? Is there another reason? For example, cache or pipelining benefits.

"I understood why Duff's device is faster than normal code which is not optimized." It isn't. It propably just confuses the compiler as much as humans, potentially leading to *slower* code. And even if there was a difference in favor of Duff's device, you don't need speed so badly you need to use this abomination. Repeat it with me. You do not need to use this abomination. — , Apr 06 '11 at 16:02
Duff's device *used to be* faster than normal code (on some platforms at least) - it is hardly anymore. In one project we did on Windows PCs and MSVC, already about 10 years ago, it turned out to be *slower* than optimized normal code. — Péter Török, Apr 06 '11 at 16:03
The real lesson here is to trust your compiler. Don't chase the possibility of marginal gains in performance (if even that, in this case) at the expense of readability and maintainability. Any time you have to ask questions on SO to understand a piece of code or get it to compile, you probably shouldn't be trying to use it. — Cody Gray - on strike, Apr 06 '11 at 16:06
The Wikipedia article you link to has the answer to your question. Which part of that answer don't you understand? — rlc, Apr 06 '11 at 16:07
Thanks guys. I don't want to use Duff's device to my real code. And I haven't ever use it and I won't. I just wonder how the code can be compiled. When I saw the code first, I thought it never be compiled. — Benjamin, Apr 06 '11 at 16:08
@Hogan - are you claiming there's some difference between C and C++ that affects Duffs device? The only thing I can think of would relate to the scope, construction and destruction of local variables declared within the switch/loop, which is almost certainly a non-issue. — , Apr 06 '11 at 16:10
@Ronald: *Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement.* I think this might be what I can't understand. I'm English newbie. — Benjamin, Apr 06 '11 at 16:13
@delnan, I'd be careful with those blanket statements. Sometimes Duff's Device is quicker, and you actually do need that "abomination" in order to meet the specification of your project. I think it's probably more correct to advise to "avoid it when not necessary". — mrduclaw, Apr 06 '11 at 16:19
@Benjamin it basically means that, at the time Duff came up with this construct, the grammar for `switch` was very **relaxed** (i.e. **not very strict**) so you could actually start a loop in the middle of a switch statement, and close the loop after a few `case` labels. That means you can use the `switch` statement to start looping in the middle of the loop (because the `while` will jump back to the `do`, which is at the start of the `switch` statement). It's really about a way to play with the flexibility of the language's *definition* to be able to write optimized (but confusing) code — rlc, Apr 06 '11 at 16:21
@mrduclaw: Duff's device is basically contrived loop unrolling; any compiler nowadays can do it by itself. — Matteo Italia, Apr 06 '11 at 16:23
@Matteo Italia, and you'd hope that you get modern compilers with modern libraries for all of the platforms you program on. I'm just suggesting that sometimes that's not the case and you have to program like it's 1980 all over again. It's nitpicky, but shouldn't we, as programmers, be concerned with corner-cases? — mrduclaw, Apr 06 '11 at 16:27
@mrduclaw: True by the nonexistence of absolutes. But I prefer to be overzealous with condeming it (and its kin). So it leads to some inefficient code, so what? Can be fixed easily if you have a performance guru aboard - you should if it matters that much! OTOH, not "dropping the premature optimization bomb" propably leads to some more needlessly "optimized" (read: obscured) code - I definitely want to avoid that. — , Apr 06 '11 at 16:27
@Ronald: **was** *very relaxed* It looks like it's not relaxed anymore. But it is still relaxed, isn't it? — Benjamin, Apr 06 '11 at 16:28
@mrduclaw: point taken, although my position is still basically with @delnan. — Matteo Italia, Apr 06 '11 at 16:29
@delnan, You're right that if it matters that much you should have a performance guru on staff. But if we stop teaching these ideas, or claim that it's *never* useful, then who will fill the shoes after the current performance guru leaves? I guess I just don't like people removing the firing pin on my gun without telling me, I'd prefer to be able to fire it if I need to. — mrduclaw, Apr 06 '11 at 16:33
@Benjamin C grammar is still pretty relaxed and I think Duff's device will still compile on a compliant compiler, but as all the other commenters said, you really shouldn't try to optimize with that kind of construct.. — rlc, Apr 06 '11 at 16:39
@mrduclaw - But you don't want to keep blacksmiths around, do you, in case you need a new barrel for your musket? Sometimes technology advances and some knowledge is just not widely needed anymore. — Bo Persson, Apr 06 '11 at 16:42
@mrduclaw: Well, would you give a youngster a lighter (I was pondering to put "flame thrower" here instead, it might be more fitting in how common appropriate use cases are) "in case they need it", or teach them to be *very* careful with fire and wait until they're mature enough not to light their own hair before giving them one? — , Apr 06 '11 at 16:42
@delnan, yes. I grew up around fire and guns, as did all my friends, and with our proper instruction there were no incidents. — mrduclaw, Apr 06 '11 at 16:44
@Bo Persson, I think it's important to understand all that you can about the world. And btw, that knowledge does still exist, have you visited a Renaissance Fair lately? Also, if you'd like, I can order you a chain mail suit for verification. — mrduclaw, Apr 06 '11 at 16:46
IMO the issue isn't Duff's Device, it's *manual loop unrolling* of any kind that you should expect not to be bothered with. On those incredibly rare occasions when you do come to manually unroll a loop, to see if that beats the optimizer (which it sometimes does), you may have loops that aren't a multiple of the degree of unrolling. At that point, you might choose to use Duff's Device, rather than a bunch of gotos, precisely *because* it lets the compiler optimize the select-and-jump part of the operation. I don't see why people are so appalled by the idea of anyone getting that far, though. — Steve Jessop, Apr 06 '11 at 17:16
[How does Duff's device work?](http://stackoverflow.com/q/514118/995714) — phuclv, Feb 07 '17 at 02:08

score 11 · Answer 1 · 2011-04-06T16:38:21.083

The "how it works" is simple enough.

Both C and C++ are compiled languages, and normally compiled to the platforms machine code. Machine code has no concept of block structures - all block structures must be translated to a form that uses (in effect) some mix of unconditional and conditional gotos.

The C syntax rules allow the switch statement and loop to be combined in a way which is not a true hierarchical block structure, but which tangles control flow. So long as the compiler can cope with this (which any good compiler should) there is no problem in the underlying machine code. The result will be "spaghetti", but generated machine code that has been through an optimiser is always spaghetti - it's not meant to be human readable, so it's not an issue. The issue here is that the source code is spaghetti too, even though the gotos have been "hidden".

Note - although any good compiler should cope with Duffs device, as others have already commented, that doesn't mean it will cope well enough to optimise it properly - only well enough to generate correct executable code. This is one of these old strange idioms that once had a purpose, but which is more likely now to confuse your compiler and sabotage it's ability to generate efficient code.

EDIT

The following is related to Duffs device, and may help illustrate the basic idea...

switch (count & 1)
{
  case 0 : goto lbl0;
  case 1 : goto lbl1;
}

lbl0:

while (count != 0)
{
  handle_one ();
  count--;
lbl1:
  handle_one ();
  count--;
}

Having case clauses inside the loop is conceptually no different to having goto-target labels inside the loop, as above.

Warning - this is purely for illustration of an idea, and should not be copied in real life code.

Incidentally, not all compilers generate correct code from Duff's device. If you are trying to compile this with C++ Builder you will need to disable the optimization called "Enable loop induction variable and strength reduction" or you will get access violations at run-time. — Jeff Wilhite, Oct 19 '11 at 13:29

score 6 · Accepted Answer · edited May 23 '17 at 12:24

A simple explanation of why Duff's Device compiles is that the syntax of the switch statement isn't particularly specific about the form that the switch statement block might need to take. There are a few restrictions, and a couple of things permitted in the controlled statement that aren't permitted outside a switch (the case and default labels). But other than that, the controlled statement is just any other statement, with the likelihood that there are labels for the switch to target.

Here's the syntax from C99:

switch ( expression ) statement

Beyond the syntax, the standard imposes a few constraints:

the controlling expression must have an integer type
there are restrictions about where VLAs can occur in the switch statement
case label expressions must be integer constant expressions
there cannot be duplicate case label expressions or default labels

Other than that, any construct permitted in a statement block should be permitted in the controlled statement (with the addition that case and default labels are OK). Remember that case and default are just labels that the switch jumps to based on the controlling expression and the case label expressions. As Potatoswatter says, switch is just a computed goto. So just like goto can jump into the middle of a loop, so can switch.

Also, I think that the cases where you might see a benefit from Duff's Device are pretty rare today (I think they were rare even in the 1980's). Don't forget that Tom Duff himself said the following in his description:

"Disgusting, no?"
"I feel a combination of pride and revulsion at this discovery."

Even more than when it was initially described, Duff's Device should be considered more of a curiosity than a tool to use.

Tom duff's description is very interesting. I didn't read it. Thanks. — Benjamin, Apr 06 '11 at 20:06

score 4 · Answer 3 · edited Jun 20 '20 at 09:12

switch is just a computed goto. So, there are several labels inside the loop, and a switch statement outside the loop. The switch decides which label to go to, and gotos there, inside the loop.

Once execution is inside the loop, it goes on looping until the loop relinquishes control.

It's actually very straightforward… and shouldn't be used unless it is the most straightforward alternative.

Don't believe anyone who tells you it makes anything fast.

I would even say to stop listening to anything they say at all, even if it's your teacher.

As for compilers, they break things down to generic control-flow graphs and don't care about switch vs. if vs. while. They are all if ( … ) goto …; else goto …; to the compiler.

score 2 · Answer 4 · answered Apr 06 '11 at 18:30

While it's true that Duff's Device is outdated for its original purpose, it's still useful for special purposes, like a state machine that normally cycles repeatedly through N states, but sometimes needs to return to the caller and later be resumed at the state where it left off. Putting the switch statement outside the loop and the case labels inside the loop (I would take this as the definition of a "Duff's device") then makes a lot of sense.

With that said, do not use Duff's devices to "optimize by hand". Putting what are effectively "goto labels" all over the place will not help the compiler optimize.

score 2 · Answer 5 · answered Apr 07 '11 at 02:23

If we take the implementation from the Wikipedia article you link...

send(to, from, count)
register short *to, *from;
register count;
{
        register n=(count+7)/8;
        switch(count%8){
        case 0:       do{     *to = *from++;
        case 7:               *to = *from++;
        case 6:               *to = *from++;
        case 5:               *to = *from++;
        case 4:               *to = *from++;
        case 3:               *to = *from++;
        case 2:               *to = *from++;
        case 1:               *to = *from++;
                }while(--n>0);
        }
}

...and replace the "high-level" do / while loop with the assembly-level if / goto that the compiler really reduces it to...

send(to, from, count)
register short *to, *from;
register count;
{
        register n=(count+7)/8;
        switch(count%8){
        case 0:     do_label: *to = *from++;
        case 7:               *to = *from++;
        case 6:               *to = *from++;
        case 5:               *to = *from++;
        case 4:               *to = *from++;
        case 3:               *to = *from++;
        case 2:               *to = *from++;
        case 1:               *to = *from++;
                if (--n>0) goto do_label;
        }
}

...it may help you perceive that - in this usage where the do/while scope didn't introduce any local variables - there's really nothing more to the do-while than a jump back to case 0 that bypasses the switch and hence the need to evaluate count % 8 (% is a pretty expensive operation in the scheme of things).

Hope that helps it click, but may not...? :-)

Why did duff use 8? It could be 16, 65536 or whatever. Because of code size? Is there another reason? For example, cache or pipelining benefits.

Just a case of diminishing returns. Having to do the --n > 0 check and a jump after every 8 data copies isn't a big percentage overhead, but the size of the code (both in source and as compiled code in cache) is still pretty tight. Maybe it'd be 90 or 95% work vs overheads, which was evidently good enough. Further, to illustrate and share the concept with others Tom Duff may have prefered it to be about a typical 80x25 terminal's worth of code rather than a page or 10.

How can Duff's device code be compiled?

5 Answers5

Don't believe anyone who tells you it makes anything fast.

Linked

Related