Here's my attempt at it:
uint16_t add(uint16_t a, uint16_t b) {
uint32_t i = a + b;
return i & 0xffff | -(i >> 16);
}
A single addition can only overflow by at most one bit. So, if we add them as 32-bit numbers and they overflow, the upper 16 bits of the result will contain the value 1.
So, we shift that right 16 places, to get either 0 or 1. Then we negate it to get 0 or -1 (which converts to the maximum value as an unsigned). Then we or
the result with the 16 bits produced by the addition. If it's 0, that won't affect the result. If it was -1 converted to unsigned, that'll have all the bits set, so when we or
it with the previous result, we still get all bits set (which is the maximum value for an unsigned).
For the ESP32, gcc 11.2 produces the following:
add(unsigned short, unsigned short):
entry sp, 32
extui a2, a2, 0, 16
extui a3, a3, 0, 16
add.n a8, a2, a3
extui a2, a8, 16, 16
neg a2, a2
or a2, a2, a8
extui a2, a2, 0, 16
retw.n
The only branch in sight is the return statement...
https://godbolt.org/z/ezhxY56qx
Of course, with a different compiler, or possibly even a different set of flags to the same compiler, it could generate a branch. But at least in a quick test on a couple dozen or so different compilers for half a dozen or so processors, I didn't see any branches generated (though compilation did simply fail on the 6502 compiler, which apparently doesn't support an unsigned long
at all).