1

I'm curious if there's a branchless way to do this, or perhaps just a generally better way:

uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
  if (UINT16_MAX - delta < num)
  {
    return UINT16_MAX;
  }

  return num + delta;
}

I tried thinking about it with min/max, but even if I try to use (well-defined) uint overflow, the best I can come up with is max(num + delta, num) which would return the original number a if overflown, or the result otherwise.

But I can't think of a way with min/max/clamp to get from this split to the desired behaviour (without branching)

Jonathan Levin
  • 594
  • 3
  • 13
  • Does calling `max` not count as branching for you? There will be an `if` or similar in it. `return min((uint32_t)num+delta, (uint32_t)UINT16_T_MAX);` probably? Do the calculation in a data type, which will not overflow and then take the smaller one. – mch Jul 20 '23 at 08:13
  • This is an interesting problem, but why do you need it? Is it just plain curiosity? That's okay, but please tell us. Otherwise, please ask us about the actual and underlying problem directly instead. – Some programmer dude Jul 20 '23 at 08:14
  • 2
    I'm writing an embedded Potentiometer driver for an Esp32 microcontroller, and writing a routine to update a value over time given a certain delta. I generally don't deal with optimization much in this space, and was curious if there's a branchless way to increment the value without accidentally overflowing. But It's not really need an important problem for my use case... which is why I asked it as more of a curiosity question – Jonathan Levin Jul 20 '23 at 08:16
  • 1
    @mch Are `min`/`max` implemented with branches? I somehow assumed on the register level you could do a compare and keep the number that is larger without conditionals. I didn't think through how that would be done so it might be a misconception – Jonathan Levin Jul 20 '23 at 08:27
  • `min` and `max` are not implemented at all in C, they are not standard functions. Which ones do you use? – mch Jul 20 '23 at 08:41
  • @mch Seems like a GNU ISO C++ library, part of the `xtensa-esp32` toolchain. Changed the question tags to C to reflect this, thanks – Jonathan Levin Jul 20 '23 at 09:21
  • 1
    @Jonathan Levin if the return type of the function can be changed, then just return as uint32_t – Abhinav Singh Jul 20 '23 at 10:45
  • The time required for this code is insignificant compared to the cost of talking to your potentiometer. You are guilty of "premature optimization" here. – Tim Roberts Jul 20 '23 at 17:35

4 Answers4

3

Stop fiddling around, keep it simple and rely on the optimizer.

Unless your optimizer is not up to sniff of course. Then by all means continue fiddling around.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • Doing a bit of looking on Godbolt, it looks like the optimizer is not up to the task on either of the compilers they have for Xtensa. Doing a quick testing, it looks like they're not up to the task in around 90-95% of the other compilers on Godbolt either. Even those that technically produce branch-free code often use conditional moves that often carry penalties similar to a branch. https://godbolt.org/z/n6GWbbjGz – Jerry Coffin Jul 20 '23 at 17:57
1

EDIT 2: I checked my specific platform (Xtensa-Esp32). Seems that all versions contain a branch, even simple boolean comparisons (a > b)

So in my case the simplest if statement in the title might be the best choice, even if the different methods are interesting on their own.

EDIT 1: Played with ChatGPT on this - after coaxing out the bugs maybe a better implementation?

uint16_t add_no_overflow(uint16_t num, uint16_t delta) {
    uint16_t sum = num + delta;
    uint16_t did_overflow = (sum > num)-1; // 0xFFFF if overflown, 0x0000 otherwise
    return (sum & ~did_overflow) | (UINT16_MAX & did_overflow);
}

Thought about this immediately after posting. The magic of rubber ducking!

Implementation with min/max (not necessarily branchless)

uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
  uint16_t final_delta = min(UINT_16_T_MAX-num, delta);
  return num + final_delta;
}
Jonathan Levin
  • 594
  • 3
  • 13
  • I'm not so sure this is really branchless, as there's likely a condition and a branch in the `min` function or macro. – Some programmer dude Jul 20 '23 at 08:33
  • @Someprogrammerdude I don't see branching with the standard `MIN` macro implementation, depending on if `cmovg` is branching or not: https://godbolt.org/z/7xfPjn4P4 – mch Jul 20 '23 at 08:43
  • 2
    Still, I'd keep it [simple](https://godbolt.org/z/vfzceMKEr). Have you checked the assembly generated on your specific target? – Bob__ Jul 20 '23 at 09:06
  • 1
    @Bob__ Wow, surprising in your example the ternary does not produce jumps! Admittedly I haven't- I'm a high level programmer by day and my knowledge on this end of stack / ability to read assembly is limited. But I am learning a lot from this :) – Jonathan Levin Jul 20 '23 at 09:18
0

Here's my attempt at it:

uint16_t add(uint16_t a, uint16_t b) {
    uint32_t i = a + b;
    return i & 0xffff | -(i >> 16);
}

A single addition can only overflow by at most one bit. So, if we add them as 32-bit numbers and they overflow, the upper 16 bits of the result will contain the value 1.

So, we shift that right 16 places, to get either 0 or 1. Then we negate it to get 0 or -1 (which converts to the maximum value as an unsigned). Then we or the result with the 16 bits produced by the addition. If it's 0, that won't affect the result. If it was -1 converted to unsigned, that'll have all the bits set, so when we or it with the previous result, we still get all bits set (which is the maximum value for an unsigned).

For the ESP32, gcc 11.2 produces the following:

add(unsigned short, unsigned short):
        entry   sp, 32
        extui   a2, a2, 0, 16
        extui   a3, a3, 0, 16
        add.n   a8, a2, a3
        extui   a2, a8, 16, 16
        neg     a2, a2
        or      a2, a2, a8
        extui   a2, a2, 0, 16
        retw.n

The only branch in sight is the return statement...

https://godbolt.org/z/ezhxY56qx

Of course, with a different compiler, or possibly even a different set of flags to the same compiler, it could generate a branch. But at least in a quick test on a couple dozen or so different compilers for half a dozen or so processors, I didn't see any branches generated (though compilation did simply fail on the 6502 compiler, which apparently doesn't support an unsigned long at all).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

Why not just the obvious

uint16_t add_no_over(uint16_t a, uint16_t b)
{
        uint32_t sum = a + b;
        return sum < 65535 ? sum : 65535;
}

It gives the following assembly code with gcc-11 for esp32 from godbolt:

        entry   sp, 32
        l32r    a8, .LC0
        extui   a3, a3, 0, 16
        extui   a2, a2, 0, 16
        add.n   a2, a2, a3
        minu    a2, a2, a8
        extui   a2, a2, 0, 16
        retw.n
jcmvbkbc
  • 723
  • 1
  • 4
  • 5