0

What is the most efficient way to add two scalars in c/c++ with overflow protection? For example, adding two unsigned chars is 255 if a+b >= 255.

I have:

unsigned char inline add_o(unsigned char x, unsigned char y)
{
    const short int maxVal = 255;
    unsigned short int s_short = (unsigned short int) x + (unsigned short int) y;
    unsigned char s_char = (s_short <= maxVal) ? (unsigned char)s_short : maxVal;
    return s_char;
}

that can be driven by:

unsigned char x = 200;
unsigned char y = 129;
unsigned char mySum = add_o(x,y);

I see some ideas here but I am interested in the fastest way to perform this operation---or at least one that is highly palatable to an optimizing compiler.

Community
  • 1
  • 1
Steve
  • 3,957
  • 2
  • 26
  • 50
  • 2
    The `(unsigned short)` casts have no effect; all types smaller than `int` are promoted to `int` when used as operands of `+` . – M.M May 15 '14 at 22:21

3 Answers3

3

For most modern compilers will generate branch-free code for your current solution, which is already fairly good. Few optimisations which are very hardware dependant (x86 in particular) are

  1. replace the comparison by a masked and
  2. try to make the overflow protection if a conditional move.

This is how I would have done it:

unsigned char inline add_o(unsigned char x, unsigned char y) {
    unsigned short int s_short = (unsigned short int) x + (unsigned short int) y;
    if (s_short & 0xFF00)
        s_short = 0xFF;

    return s_short;
}
Sergey L.
  • 21,822
  • 5
  • 49
  • 75
  • 1
    Worth pointing out that the second one might in fact be a pessimization instead of an optimization. – user541686 May 15 '14 at 16:19
  • 1
    Does any compiler generate better code for `s_short & 0xFF00` than `s_short > UCHAR_MAX` ? – M.M May 15 '14 at 22:28
  • @MattMcNabb This is less of a compiler, than a hardware thing. `test %eax, $0xFF00` executes faster on x86 than `cmp %eax, $255`. – Sergey L. May 19 '14 at 12:04
  • The compiler's job is to emit the instructions that are best for the hardware it is targeting – M.M May 19 '14 at 23:59
2

You mean unsigned saturating arithmetic?

unsigned char inline add_o(unsigned char x, unsigned char y) {
    unsigned char s = x + y;
    s |= (unsigned)(-(s < x));
    return s;
}
ldav1s
  • 15,885
  • 2
  • 53
  • 56
  • Yes, that's very nice. For more see: http://locklessinc.com/articles/sat_arithmetic/ – Steve May 15 '14 at 16:25
  • Thanks. Some CPUs have direct support for saturating arithmetic also and that could be faster than this, depending on whether the compiler is smart enough to emit those instructions for this code. – ldav1s May 15 '14 at 16:30
  • Your code relies on the compilation platform being 2's complement. `s |= -(unsigned)(s < x);` instead works everywhere (except perhaps where sizeof(int)==sizeof(char)). – Pascal Cuoq May 15 '14 at 19:14
  • @PascalCuoq, I don't see how it'll work for sign & magnitude or ones' complement at all when `(s < x) == 1`. So yes, it is reliant on two's complement. – ldav1s May 15 '14 at 21:16
  • @ldav1s With the cast I suggest, it isn't. `- (unsigned) 1` is UINT_MAX, and since the standard forces “pure binary notation” (C99 6.2.6.1:3) for unsigned integers, all non-padding bits of `- (unsigned) 1` have to be set, portably, regardless of the platform's representation of negative integers. – Pascal Cuoq May 15 '14 at 21:27
  • The appropriate reference is actually C99 6.2.6.2:1 “the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N−1), so that objects of that type shall be capable of representing values from 0 to (2^N)−1 using a pure binary representation; this shall be known as the value representation.” – Pascal Cuoq May 15 '14 at 21:34
  • There's also C99: 6.3.1.3 which directly states the same thing. I've updated my code with your suggestion. Thanks! – ldav1s May 15 '14 at 22:17
  • It's much clearer to go `unsigned int s = x + y; return s > UCHAR_MAX ? UCHAR_MAX : s;` – M.M May 15 '14 at 22:27
  • @MattMcNabb The reason it is written this way is that this expression is easy to compile to a branchless function, whereas it is not easy for a compiler to compute your version to a branchless version. Anyway, why do you offer your proposal in a comment below an existing answer? If you think your version is the best answer to the question, offer it as an answer to the question! – Pascal Cuoq May 16 '14 at 20:05
2

The most efficient way is to pre-fill a table with all possible results, then use the addition of x and y to index into that table.

#include <iostream>

unsigned char add_o_results[255+255];

void pre_fill() {
    for (int i = 0 ; i < 255 + 255 ; ++i) {
        add_o_results[i] = std::min(i, 255);
    }
}

unsigned char inline add_o(unsigned char x, unsigned char y)
{
    return add_o_results[x+y];
}

using namespace std;

int main()
{
    pre_fill();

    cout << int(add_o(150, 151)) << endl;
    cout << int(add_o(10, 150)) << endl;

   return 0;
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • This involves a dereference (unless the whole table is optimized out of course), are you sure it is faster than a comparison? – M.M May 15 '14 at 22:30
  • 2
    The correct answer here is 'it depends' (on the state of the cache). To be honest, I would prefer to see the author simply writing `std::min(MAXVAL, x+y)` since this allows the compiler to work out what is most efficient and is patently obvious to a maintainer. Of course if this becomes a bottleneck (I can guarantee that it never will), the author can optimise it in some way. – Richard Hodges May 15 '14 at 22:37