Saturating subtract/add for unsigned bytes

Question

Imagine I have two unsigned bytes b and x. I need to calculate bsub as b - x and badd as b + x. However, I don't want underflow/overflow occur during these operations. For example (pseudo-code):

b = 3; x = 5;
bsub = b - x; // bsub must be 0, not 254

and

b = 250; x = 10;
badd = b + x; // badd must be 255, not 4

The obvious way to do this includes branching:

bsub = b - min(b, x);
badd = b + min(255 - b, x);

I just wonder if there are any better ways to do this, i.e. by some hacky bit manipulations?

`y ^ ((x ^ y) & -(x < y))` for `int` types evaluates `min(x, y)` without branching. This could form part of an eventual solution, based on what you have so far. — Bathsheba, Nov 02 '15 at 15:41
`y ^ ((x ^ y) & -(x < y));` is `min(x,y)` (usually) without a branch (where `x < y` might, depending on the machine still be a branch) but modern architecture has conditional move which is probably not much slower. — Pixelchemist, Nov 02 '15 at 15:42
Perhaps [Clamped Increment Integer?](http://stackoverflow.com/q/25615440/1708801) is helpful. — Shafik Yaghmour, Nov 02 '15 at 15:46
@ShafikYaghmour: I think that could be. And if they used a `char` type, then, from C++14 onwards that **must** be 2's complement. — Bathsheba, Nov 02 '15 at 15:48
`I just wonder if there are any better ways to do this, i.e. by some hacky bit manipulations?` Unless the code is in a very performance critical part, the hacky optimizations only end up harder to read. I suggest a more specific question like `...are there any more efficient ways...`. — eerorika, Nov 02 '15 at 15:50
@Bathsheba indeed, I added an answer since the code generated for the uinit8_t case based on that approach looks reasonable. — Shafik Yaghmour, Nov 02 '15 at 16:21
Just confirming: You'd prefer a wrong answer, just to stay within a byte? 254 + 254 = 255, and 1 - 254 = 0 ? — Alan Campbell, Nov 03 '15 at 02:44
@AlanCampbell it is called [Saturating Arithmetic](https://en.wikipedia.org/wiki/Saturation_arithmetic). — Shafik Yaghmour, Nov 03 '15 at 02:53
Do you need it to be portable? Because if you're looking at a specific architecture, then there's probably a nice single instruction. I know ARM has saturating vector addition and subtraction for bytes. On X86 the `_mm_adds_epi8` intrinsic will do a saturating add of 16 bytes in a single instruction. — porglezomp, Nov 03 '15 at 04:30
like @FUZxxl said, can you clarify why you tagged with both C and C++. On it's face it looks more like a C question and the vast majority of the answers are C focused. — Shafik Yaghmour, Nov 03 '15 at 18:55
Obviously it means that I use C++ compiler, so it is ok to use anything from standard library (e.g. std::min), but pure C solutions are also ok. — ovk, Nov 03 '15 at 19:46

Shafik Yaghmour · Accepted Answer · 2015-11-02T16:23:25.530

88

The article Branchfree Saturating Arithmetic provides strategies for this:

Their addition solution is as follows:

u32b sat_addu32b(u32b x, u32b y)
{
    u32b res = x + y;
    res |= -(res < x);

    return res;
}

modified for uint8_t:

uint8_t  sat_addu8b(uint8_t x, uint8_t y)
{
    uint8_t res = x + y;
    res |= -(res < x);

    return res;
}

and their subtraction solution is:

u32b sat_subu32b(u32b x, u32b y)
{
    u32b res = x - y;
    res &= -(res <= x);

    return res;
}

modified for uint8_t:

uint8_t sat_subu8b(uint8_t x, uint8_t y)
{
    uint8_t res = x - y;
    res &= -(res <= x);

    return res;
}

edited Nov 02 '15 at 16:23

answered Nov 02 '15 at 16:17

Shafik Yaghmour

154,301
39
440
740

1

Is this solution portable? I think it assumes that -1 is represented in 2's complement form (all bits set to 1). – user1969104 Nov 03 '15 at 03:28
2

@user1969104 that may be the case but as the comment in the article indicates, that is solved by casting to unsigned before applying unary minus. In practicality [it is unlikely you will have to deal with anything else but two's complement](http://stackoverflow.com/q/12276957/1708801). – Shafik Yaghmour Nov 03 '15 at 04:09
I realized that this may be portable because of using unsigned types. -1 should be the largest unsigned value similar to the overflow arithmetic. However I am not sure if the result of `(res < x)` is unsigned or requires typecast. – user1969104 Nov 03 '15 at 04:18
2

This may be a good C answer, but isn't a very good C++ answer. – Yakk - Adam Nevraumont Nov 03 '15 at 18:46
@Yakk I meant to think about this from a C++ perspective but I did not get the chance yet. – Shafik Yaghmour Nov 03 '15 at 18:56
6

@Yakk What makes this a "bad" C++ answer? These are basic mathematical operations, and I don't see how it would be interpreted as C only, or bad C++. – JPhi1618 Nov 03 '15 at 19:54
4

@JPhi1618 A better C++ answer might be `templatestruct sat{T t;};` with overloaded operators that saturate? Proper use of namespaces. Mostly sugar. – Yakk - Adam Nevraumont Nov 03 '15 at 19:59
6

@Yakk, Ah, ok. I just saw this as a minimal example that the OP could adapt as needed. I wouldn't expect to see that complete of an implementation. Thanks for clarifying. – JPhi1618 Nov 03 '15 at 20:02

user1969104 · Answer 2 · 2015-11-02T16:57:16.633

40

A simple method is to detect overflow and reset the value accordingly as below

bsub = b - x;
if (bsub > b)
{
    bsub = 0;
}

badd = b + x;
if (badd < b)
{
    badd = 255;
}

GCC can optimize the overflow check into a conditional assignment when compiling with -O2.

I measured how much optimization comparing with other solutions. With 1000000000+ operations on my PC, this solution and that of @ShafikYaghmour averaged 4.2 seconds, and that of @chux averaged 4.8 seconds. This solution is more readable as well.

edited Nov 02 '15 at 16:57

answered Nov 02 '15 at 15:50

user1969104

2,340
14
15

5

@user694733 It's not optimized away, it's optimized into a conditional assignment depending on the carry flag. – fuz Nov 02 '15 at 16:00
2

Yes user694733 is right. It is optimized into a conditional assignment. – user1969104 Nov 02 '15 at 16:02
This would not work for all cases, for example badd: b = 155 x =201, than badd = 156, and that is bigger than b. **You would need to compare the result to the min() or max() of the two variables, depending on the operation** – Cristian F Nov 17 '15 at 13:59
@CristianF How do you calculate 155+201 = 156? I think it needs to be 155+201 = 356%256 = 100. I do not think min(), max() is needed in any combination of b, x values. – user1969104 Nov 17 '15 at 14:36

score 16 · Answer 3 · edited May 23 '17 at 11:46

16

For subtraction:

diff = (a - b)*(a >= b);

Addition:

sum = (a + b) | -(a > (255 - b))

Evolution

// sum = (a + b)*(a <= (255-b)); this fails
// sum = (a + b) | -(a <= (255 - b)) falis too

Thanks to @R_Kapp

Thanks to @NathanOliver

This exercise shows the value of simply coding.

sum = b + min(255 - b, a);

edited May 23 '17 at 11:46

Community

1
1

answered Nov 02 '15 at 15:45

chux - Reinstate Monica

143,097
13
135
256

For `sum` perhaps `(a + b) | -(a <= (255 - b))`? – R_Kapp Nov 02 '15 at 15:58
You *could* do `sum = ((a + b) | (!!((a + b) & ~0xFF) * 0xFF)) & 0xFF`, assuming `sizeof(int) > sizeof(unsigned char)`, but this looks so complex that I don't know if you would gain anything with it (other than headache). – user694733 Nov 02 '15 at 16:17
@user694733 Yes and maybe even `(a+b+1)*(a <= (255-b)) - 1`. – chux - Reinstate Monica Nov 02 '15 at 16:20
@NathanOliver Thanks for the oversight - the telling aspect of this is that the `sub` was easy as the limit was `0`. But other limits pose complications and follow [user2079303](http://stackoverflow.com/questions/33481295/subtract-add-value-without-overflow-or-underflow/33481458#comment54747929_33481295) comment. – chux - Reinstate Monica Nov 02 '15 at 16:22
This solution generates much more ASM code than the solution by me with the -O2 flag to gcc. – user1969104 Nov 02 '15 at 16:29
1

@user1969104 OP was not clear on "better" (code space vs. speed performance) nor target platform nor compiler. Speed assessment makes most sense in the context of the un-posted larger problem. – chux - Reinstate Monica Nov 02 '15 at 16:51
@chux I understand. Just out of curiosity, I checked on an intel processor on 64-bit ubuntu machine. I measured it and posted results in my solution. – user1969104 Nov 02 '15 at 16:58
Seems to me that multiplying by `bool`s obscures the intent; probably would be better for future users to be more explicit with conditionals. – Kyle Kanos Nov 03 '15 at 13:06
@Kyle Kanos Detail: in C, the results of relational operators like `>=` is `int`, not `bool`. "better", unfortunately is unclear in OP's post concerning execution efficiency, code size or source code clarity. I assume OP was looking for efficiency - something highly machine/compiler dependent - and so offered code that may perform faster. YMMV. – chux - Reinstate Monica Nov 03 '15 at 14:11
@chux: I use C++ more than C nowadays & OP used both tags, hence `int` vs `bool` comment. My usage of 'better' is for future users' *understanding* of the code, not for the undefined improvements wanted by OP. – Kyle Kanos Nov 03 '15 at 14:30
@Kyle Kanos, Yes - I too now the post is dual language tagged. This is one of those cases where C and C++ slightly diverge. – chux - Reinstate Monica Nov 03 '15 at 14:36

score 13 · Answer 4 · answered Nov 03 '15 at 06:40

13

If you are using a recent enough version of gcc or clang (maybe also some others) you could use built-ins to detect overflow.

if (__builtin_add_overflow(a,b,&c))
{
  c = UINT_MAX;
}

answered Nov 03 '15 at 06:40

erebos

153
1
4

1

This is the best answer. Using compiler built-ins instead of bit magic is not only faster, it is also clearer and makes the code more maintainable. – Cephalopod Nov 03 '15 at 09:48
Thank you, @erebos. I'll definitely try this on platforms where it is available. – ovk Nov 03 '15 at 12:38
4

I can not get gcc to generate brachless code with this one, which is a bit disappointing. The especially unfortunate thing here is that [clang uses different names for these](http://stackoverflow.com/a/32317442/1708801). – Shafik Yaghmour Nov 03 '15 at 17:58
1

@Cephalopod And it's completely non-crossplatform, heck most likely doesn't even work on another compiler. Not a good solution for the 21st century. – Ela782 Nov 04 '15 at 14:38
1

@Ela782 It's exactly the other way around: built-ins are not a good solution for the 20th century. Welcome to the future! – Cephalopod Nov 04 '15 at 19:40
@ShafikYaghmour I made an answer using builtin/intrinsic functions without branching http://stackoverflow.com/a/33527635/1681678 – MichaelMitchell Nov 04 '15 at 22:33
@Cephalopod I don't see how using non-standard, compiler-specific things can be a good thing. If the built-ins were standardized I'd agree with you, but I highly doubt they are. – Ela782 Nov 05 '15 at 04:25

score 3 · Answer 5 · answered Nov 02 '15 at 20:33

For addition:

unsigned temp = a+b;  // temp>>8 will be 1 if overflow else 0
unsigned char c = temp | -(temp >> 8);

For subtraction:

unsigned temp = a-b;  // temp>>8 will be 0xFF if neg-overflow else 0
unsigned char c = temp & ~(temp >> 8);

No comparison operators or multiplies required.

score 2 · Answer 6 · edited Nov 03 '15 at 04:18

2

You could also use the safe numerics library at Boost Library Incubator. It provides drop-in replacements for int, long, etc... which guarantee that you'll never get an undetected overflow, underflow, etc.

edited Nov 03 '15 at 04:18

Shafik Yaghmour

154,301
39
440
740

answered Nov 02 '15 at 16:01

Robert Ramey

1,114
8
16

7

Providing an example of how to use the library would make this a better answer. In addition, do they provide a brachless guarantee? – Shafik Yaghmour Nov 02 '15 at 16:59
The library has extensive documentation and examples. But the end of the day it's as easy as including the appropriate header and substituting safe for int. – Robert Ramey Nov 03 '15 at 17:35
branchless? I guess you man branchless. The library uses template metaprogramming to include run time checks only when necessary. For example unsigned char times unsigned char will result in unsigned int. This can never overflow so no checking at all need be done. On the other hand, unsigned times unsigned can overflow so it has to be checked at runtime. – Robert Ramey Nov 03 '15 at 17:38

score 2 · Answer 7 · answered Nov 02 '15 at 16:44

2

All can be done in unsigned byte arithmetic

// Addition without overflow
return (b > 255 - a) ? 255 : a + b

// Subtraction without underflow
return (b > a) ? 0 : a - b;

answered Nov 02 '15 at 16:44

1

This is actually one of the best solutions. All the others making the substraction or addition before are actually creating an undefined behavior in C++, resulting in the compiler being able to do whatever it wants. In practice you can mostly predict what will happen, but still. – Adrien Hamelin Nov 07 '15 at 13:13

score 2 · Answer 8 · answered Nov 03 '15 at 01:44

If you want to do this with two bytes, use the simplest code possible.

If you want to do this with twenty billion bytes, check what vector instructions are available on your processor and whether they can be used. You might find that your processor can do 32 of these operations with a single instruction.

score 2 · Answer 9 · answered Nov 04 '15 at 16:57

If you are willing to use assembly or intrinsics, I think I have an optimal solution.

For subtraction:

We can use the sbb instruction

In MSVC we can use the intrinsic function _subborrow_u64 (also available in other bit sizes).

Here is how it is used:

// *c = a - (b + borrow)
// borrow_flag is set to 1 if (a < (b + borrow))
borrow_flag = _subborrow_u64(borrow_flag, a, b, c);

Here is how we could apply it to your situation

uint64_t sub_no_underflow(uint64_t a, uint64_t b){
    uint64_t result;
    borrow_flag = _subborrow_u64(0, a, b, &result);
    return result * !borrow_flag;
}

For addition:

We can use the adcx instruction

In MSVC we can use the intrinsic function _addcarry_u64 (also available in other bit sizes).

Here is how it is used:

// *c = a + b + carry
// carry_flag is set to 1 if there is a carry bit
carry_flag = _addcarry_u64(carry_flag, a, b, c);

Here is how we could apply it to your situation

uint64_t add_no_overflow(uint64_t a, uint64_t b){
    uint64_t result;
    carry_flag = _addcarry_u64(0, a, b, &result);
    return !carry_flag * result - carry_flag;
}

I don't like this one as much as the subtraction one, but I think it is pretty nifty.

If the add overflows, carry_flag = 1. Not-ing carry_flag yields 0, so !carry_flag * result = 0 when there is overflow. And since 0 - 1 will set the unsigned integral value to its max, the function will return the result of the addition if there is no carry and return the max of the chosen integral value if there is carry.

You might want to mention that this answer is for a specific instruction-set architecture (x86?) and will require reimplementing for each target architecture (SPARC, MIPS, ARM, etc) — Toby Speight, Mar 04 '19 at 18:43

score 1 · Answer 10 · 2015-11-02T15:48:40.303

1

what about this:

bsum = a + b;
bsum = (bsum < a || bsum < b) ? 255 : bsum;

bsub = a - b;
bsub = (bsub > a || bsub > b) ? 0 : bsub;

edited Nov 02 '15 at 15:48

answered Nov 02 '15 at 15:45

I fixed the (obvious?) typo, but I still don't think this is correct. – Bathsheba Nov 02 '15 at 15:46
This also includes branching. – fuz Nov 02 '15 at 15:49
I will delete this answer just a quick question in assembly without optimization what is difference between ternary operator and if/else statement? – Nov 02 '15 at 15:57
@GRC There is no difference. – fuz Nov 02 '15 at 15:59
@GRC FUZxxl is right, but, as always, try yourself. Even if you don't know assembly (you could make a question here on SO if something's not clear to you), just by checking the length / instructions you'll know. – edmz Nov 02 '15 at 17:59
Guys I did it, there is difference and unlike if/else version ternary operation does not include a single jump statement. – Nov 03 '15 at 05:47

DanielHsH · Answer 11 · 2015-11-02T17:54:36.183

1

If you will call those methods a lot, the fastest way would be not bit manipulation but probably a look-up table. Define an array of length 511 for each operation. Example for minus (subtraction)

static unsigned char   maxTable[511];
memset(maxTable, 0, 255);           // If smaller, emulates cutoff at zero
maxTable[255]=0;                    // If equal     - return zero
for (int i=0; i<256; i++)
    maxTable[255+i] = i;            // If greater   - return the difference

The array is static and initialized only once. Now your subtraction can be defined as inline method or using pre-compiler:

#define MINUS(A,B)    maxTable[A-B+255];

How it works? Well you want to pre-calculate all possible subtractions for unsigned chars. The results vary from -255 to +255, total of 511 different result. We define an array of all possible results but because in C we cannot access it from negative indices we use +255 (in [A-B+255]). You can remove this action by defining a pointer to the center of the array.

const unsigned char *result = maxTable+255;
#define MINUS(A,B)    result[A-B];

use it like:

bsub  = MINUS(13,15); // i.e 13-15 with zero cutoff as requested

Note that the execution is extremely fast. Only one subtraction and one pointer deference to get the result. No branching. The static arrays are very short so they will be fully loaded into CPU's cache to further speed up the calculation

Same would work for addition but with a bit different table (first 256 elements will be the indices and last 255 elements will be equal to 255 to emulate the cutoff beyond 255.

If you insist on bits operation, the answers that use (a>b) are wrong. This still might be implemented as branching. Use the sign-bit technique

// (num1>num2) ? 1 : 0
#define        is_int_biggerNotEqual( num1,num2) ((((__int32)((num2)-(num1)))&0x80000000)>>31)

Now you can use it for calculation of subtraction and addition.

If you want to emulate the functions max(), min() without branching use:

inline __int32 MIN_INT(__int32 x, __int32 y){   __int32 d=x-y; return y+(d&(d>>31)); }              

inline __int32 MAX_INT(__int32 x, __int32 y){   __int32 d=x-y; return x-(d&(d>>31)); }

My examples above use 32 bits integers. You can change it to 64, though I believe that 32 bits calculations run a bit faster. Up to you

edited Nov 02 '15 at 17:54

answered Nov 02 '15 at 17:31

DanielHsH

4,287
3
30
36

3

It likely will not, actually: first, of course, loading up the table is slow. Bit operations take 1 cycle, loading from memory takes approximately 80 ns; even from the L1 cache we're in the range of 20 ns, which is almost 7 cycles on a 3GHz CPU. – edmz Nov 02 '15 at 17:53
You are not entirely correct. LUT method will take a few sycles but bit manipulation is not a single cycle as well. There are a few sequential actions. For example, only calculating the MAX() requires 2 subtractions, and logical operation and one shift right. And don't forget the integer promotion/demotion – DanielHsH Nov 02 '15 at 17:59
1

I meant to say that single bitwise operations take 1 cycle, naturally assuming register operands. With the code that Shafik showed, clang outputs 4 elementary instructions. Also `(x > y)`, is branchless. – edmz Nov 02 '15 at 18:09
First, (x > y) might use branching. You don't know on which architecture you are running. I tend to agree that it is possibly branchless on Intel architecture. Most smartphones are not Intel. That is also the reason that you cannot know how many assembly instructions there will be. Try my solution on your PC. I am interested in hearing the results. – DanielHsH Nov 02 '15 at 18:20
How can't it be branchless? Branching takes in when you have to jump depending on the result of the comparison (as in `if-else/?`); in that case you just take the result of the operation (i.e. the flags affected). – edmz Nov 02 '15 at 19:51
1

L1 cache is much much faster than 20 ns, it's in the order of maybe 4 processor cycles. And will likely be using an otherwise unused execution unit, and be fully pipelined anyway. Measure it. And 20ns is 60 cycles in a 3 GHz CPU. – gnasher729 Nov 03 '15 at 01:42
On some assembly languages (x>y) is implemented as c statement (x>y) ? 1 : 0; It has branching. Regarding the processing time - My measurements were indecisive. I tested the LUT (look up table) against 'Shafiks' code and on some hardware the LUT wins, on other 'Shafiks' code. The up side of the LUT that its performance is less depended on specific assembly languages and compiler optimizations flags (and is also branch-less guaranteed on every architecture). – DanielHsH Nov 03 '15 at 06:08
@DanielHsH: For processors which can't do a branchless x>y, what do you think of my approach? If the original values are in registers and aren't needed afterward, and if the result needn't be masked, they'd probably be about 4 instructions: add, shift-and-move, negate, or, using just the two original registers. If the result needs to be masked, that would likely add one instruction. – supercat Nov 03 '15 at 20:58

Saturating subtract/add for unsigned bytes

11 Answers11

Linked

Related