1

Modern CPUs since at least the 486 ¹) have a tightly-pipelined design, so conditional branches can cause "stalls" in which the pipeline has to be flushed and the code restarted on a different branch of the program. That's why it makes sense to avoid conditional branches by using branchless programming techniques on these modern CPUs.

But what about the older Processors like the 8086, 8088, 80286 and 80386, that had a loosely coupled (buffered) pipeline at best? Does branchless programming make any sense on them? Especially when the branchless version contains more instructions than the branched one?

¹) The 80486 does have a 5 stages pipeline and was the first x86 CPU that used a tightly-pipelined design.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Coder
  • 197
  • 6
  • 4
    You can count cycles (https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_i.html) for those early CPUs and see if you can come out ahead depending on the problem. Taken `jcc` branches are somewhat expensive on 8088 / 186 (16 cycles), and like 10 cycles on 386 vs. 3 for not-taken. (For 8088, code-fetch is often the major bottleneck, not execution cycle counts: 4 cycles per byte of memory loaded/stores: [Why is LOOP faster than DEC,JNZ on 8086?](https://stackoverflow.com/q/71117163) - The prefetch buffer typically hides exec times, except when drained by taken branches) – Peter Cordes Jul 12 '23 at 16:15
  • 1
    Branchless programming is also going to be a lot more painful on these chips since you don't have `CMOVcc`. Before the 386 you don't have `SETcc` either. – Nate Eldredge Jul 12 '23 at 17:10
  • 1
    I'd actually be kind of curious to see an optimized branchless version of, say, `max` for the 8086. For unsigned, I guess you can get a mask of the carry flag with `sbb` from zero, so something like `xor dx, dx / cmp ax, bx / sbb dx, 0 / and bx, dx / not dx / and ax, dx / or ax, bx`. Signed is tricker, as the obvious `mov cl, 15 / sar dx, cl` for getting a mask of the sign bit is a microcoded loop and takes 68 cycles! But we could `shl dx, 1` to get the sign bit into the carry flag and then proceed with `sbb` as above. – Nate Eldredge Jul 12 '23 at 17:33
  • Oh, replace `sbb dx, 0` by `sbb dx, dx` and then you don't need to zero `dx` beforehand. – Nate Eldredge Jul 12 '23 at 17:50

0 Answers0