21

I recently learned that integer overflow is an undefined behavior in C (side question - is it also UB in C++?)

Often in C programming you need to find the average of two values a and b. However doing (a+b)/2 can result in overflow and undefined behavior.

So my question is - what is the right way to find the average of two values a and b in C?

Trudbert
  • 3,178
  • 14
  • 14
bodacydo
  • 75,521
  • 93
  • 229
  • 319
  • 10
    Expand the expression. And yes, **signed** integer overflow is UB in both C and C++ (though I don't know about C, but according to the C++ standard, unsigned arithmetic does not overflow). – chris Jul 23 '14 at 20:40
  • 7
    `a/2 + b/2` for example won't lead to integer overflow. – BlackDwarf Jul 23 '14 at 20:41
  • 4
    If the signs are opposite, `(a + b)/2` is safe from overflow. If the signs are the same, then `a + (b - a) / 2` is safe. – Jonathan Leffler Jul 23 '14 at 20:41
  • 10
    @DeepBlackDwarf: but (1 + 3)/2 = 2 whereas 1/2 + 3/2 = 1. – Jonathan Leffler Jul 23 '14 at 20:42
  • 1
    Cert has a good reference on preventing signed overflow, I linked to it in this [answer](http://stackoverflow.com/a/23280701/1708801). – Shafik Yaghmour Jul 23 '14 at 20:43
  • 1
    @chris Could you explain why unsigned arithmetic doesn't overflow? After reaching the maximum value it would overflow to 0? Or is there another term for this when unsigned arithmetic is involved? – bodacydo Jul 23 '14 at 20:47
  • @DeepBlackDwarf @JonathanLeffler Interesting. Should `a/2 + b/2` be avoided? – bodacydo Jul 23 '14 at 20:48
  • @JonathanLeffler That is absolutely correct, my bad. Casting to float works then, but leads to a lot more complicated code... – BlackDwarf Jul 23 '14 at 20:49
  • 9
    @bodacydo: unless you want results like the average of 1 and 3 is 1, then `a/2 + b/2` is to be avoided. You could probably do something fancy like `a/2 + b/2 + (a%2 + b%2)/2` and I think you get the correct answer, but ... – Jonathan Leffler Jul 23 '14 at 20:50
  • If you don't mind dealing with branches, you could always use an `if` statement or conditional that uses different expressions based on whether the signs are the same. – Drew McGowen Jul 23 '14 at 20:53
  • @bodacydo, Unsigned arithmetic is modulo 2ⁿ, where n is the number of bits. Thus, 2³² for a 32-bit unsigned type is not an overflow, it's just 0. – chris Jul 23 '14 at 20:54
  • 1
    In many cases you can just do the calculation in a wider type like `long long` or `intmax_t`. This fails if the addition can overflow a 64-bit integer. – Keith Thompson Jul 23 '14 at 20:59
  • Calculating the average of two integer numbers rounded towards zero in a single instruction cycle: http://google.com/patents?id=eAIYAAAAEBAJ&dq=6007232 i.e. `(a >> 1) + (b >> 1) + (a & b & 0x1)` – manlio Jul 23 '14 at 21:18
  • @manlio: is the result of right shifting a negative value defined? – Jonathan Leffler Jul 23 '14 at 21:26
  • @JonathanLeffler It is implementation-defined. – dyp Jul 23 '14 at 21:27
  • @dyp: you're right; the C11 standard says the result is implementation defined, which in turn means that the result of `(a >> 1) + (b >> 1) + (a & b & 0x1)` (where `a` or `b` is negative) can legitimately vary across platforms, depending on whether the right shift is 'arithmetic' or 'logical' (sign preserving or shifts in zeros). – Jonathan Leffler Jul 23 '14 at 21:34
  • @JonathanLeffler it isn‘t. It should be changed to use math operators, but it‘s interesting how many answers contain something very similar. – manlio Jul 23 '14 at 21:36
  • @JonathanLeffler ...a case against software patents? – manlio Jul 23 '14 at 21:45
  • 1
    What's wrong with using floats? – Pharap Jul 24 '14 at 01:12
  • of course you could always use the avg() library function and let that function handle any potential problems. – user3629249 Jul 24 '14 at 04:44
  • 1
    Both signed and unsigned ints overflow. However in both C and C++ unsigned ints overflow predictably - the C and C++ standards both define what happens when the integer overflows. For signed int it's undefined (or implementation defined?) so it may behave differently depending on CPU architecture, compiler, OS or (if it's undefined) random luck. – slebetman Jul 24 '14 at 06:58
  • A similar question is http://stackoverflow.com/questions/5697500/take-the-average-of-two-signed-numbers-in-c – manlio Jul 24 '14 at 16:24
  • @JonathanLeffler True that "If the signs are the same, then `a + (b - a) / 2`" is _safe_ (no overflow), but it may result in a different answer (by 1) from simply using a wider integer type. – chux - Reinstate Monica Apr 24 '15 at 18:22

8 Answers8

15

With help from Secure Coding

if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
    ((si_b < 0) && (si_a < (INT_MIN - si_b))))
{
  /* will overflow, so use difference method */
  return si_b + (si_a - si_b) / 2;
} 
else
{
 /* the addition will not overflow */
  return (si_a + si_b) / 2;
}

ADDENDUM

Thanks to @chux for pointing out the rounding problem. Here's a version that's tested for correct rounding...

int avgnoov (int si_a, int si_b)
{
    if ((si_b > 0) && (si_a > (INT_MAX - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b > 0; 
          we want difference also > 0
          so rounding works correctly */
      if (si_a >= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    } 
    else if ((si_b < 0) && (si_a < (INT_MIN - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b < 0; 
          we want difference also < 0
          so rounding works correctly */
      if (si_a <= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    }
    else
    {
     /* the addition will not overflow */
      return (si_a + si_b) / 2;
    }
}
Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • What if `si_b` is `INT_MIN`? You have then an integer overflow in `(si_a < (INT_MIN - si_b))`. – ouah Jul 23 '14 at 21:45
  • @ouah `INT_MIN - INT_MIN` is an overflow? I think it's more like `0`. – dyp Jul 23 '14 at 21:47
  • @dyp Computing `-INT_MIN` is an overflow in C and invokes undefined behavior. – ouah Jul 23 '14 at 21:50
  • 1
    @ouah Yes, but unary `-` is not binary `-`. 6.5.6/6 "The result of the binary `-` operator is the difference resulting from the subtraction of the second operand from the first." – dyp Jul 23 '14 at 21:52
  • 3
    +1, this is a good solution, no integer overflow and it also has the advantage of not using the `%` operator. `%` is a remainder operator in c99 but in c90 it is either remainder or a modulo operator (remainder and modulo differ with respect to negative values). – ouah Jul 23 '14 at 22:54
  • `si_b + (si_a - si_b) / 2` is going to produce a wrong answer. 3 + (1-3) /2 = 3+(-2)/2 = 1/2 = 0,5. The answer for 3 & 1 should be 2 – EternalWulf Jul 24 '14 at 09:31
  • @HowlinWulf `si_b == 3 > 0`, `(INT_MAX - si_b < si_a) == false`, so the second branch will be used. – dyp Jul 24 '14 at 10:30
  • @HowlinWulf `3 + (1 - 3)/2 == 3 + -2/2 == 3 + -1 == 2`, no? – Tavian Barnes Aug 11 '14 at 15:36
  • So is this the right answer? I want to accept an answer yet I don't know which one is the right one and works in all situations... – bodacydo Oct 19 '14 at 20:50
  • I am confident that this answer is correct and works in all situations. – Doug Currie Oct 21 '14 at 00:11
  • Consider your average `-409634972 -1920670633 --> -1165152803` With `long long` math, answer `-1165152802`. This answer's approach rounds away from 0 when using `si_b + (si_a - si_b) / 2` but rounds toward 0 with `(si_a + si_b) / 2`. Suggest reviewing. – chux - Reinstate Monica Apr 23 '15 at 21:08
  • Thanks, @chux; I've added a tested version. – Doug Currie Apr 24 '15 at 17:35
  • Wow - just posted a similar answer. [ships that pass in the night](http://en.wiktionary.org/wiki/ships_that_pass_in_the_night) There are some subtle differences, but if you I like I'll take mine down. – chux - Reinstate Monica Apr 24 '15 at 17:44
  • 1
    @chux, no problem, it's nice to have alternatives. – Doug Currie Apr 24 '15 at 21:04
11
(a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)

The shift statement (x >> i) in c int mathematics is equivalent to a division by 2 to the power of i. So the statement (a >> 1) + (b >> 1) is the same as a/2 + b/2. However the mean of the truncated parts of the number need to be added as well. This value can be obtained by masking (a & 1), adding ((a & 1) + (b & 1)) and dividing (((a & 1) + (b & 1)) >> 1). The mean becomes (a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)

Note: the reason to use >> and & rather than / and % as the division and remainder operators is one of efficiency.

user1837841
  • 316
  • 1
  • 8
  • 3
    "Note: the reason to use >> and & rather than / and % as the division and remainder operators is one of efficiency." Your compiler is smarter than that. See [Is multiplication and division using shift operators in C actually faster?](http://stackoverflow.com/questions/6357038/is-multiplication-and-division-using-shift-operators-in-c-actually-faster) So please, for the readability of the code, use / and % when that is what you intend. – Frank Kusters Jul 24 '14 at 08:27
  • I try to write my code as I intend it to be executed, even in a situation where I know that all optimisations are done automatically. – user1837841 Jul 24 '14 at 09:27
  • I took too long to edit: I try to write my code as I intend it to be executed, even in a situation where I know that all optimisations are done automatically. (Although, there may be situation where it's more complicated than that eg: expanding ringbuffers.) – user1837841 Jul 24 '14 at 09:34
  • 3
    Right shifting a negative value has an implementation-defined result. You'd be better of with division, since that's guaranteed to yield a negative result or 0. I'm also not sure if `a & 1` is guaranteed to be equal to `abs(a % 2)`. – dyp Jul 24 '14 at 10:32
  • 3
    With `a = -2` and `b = 1`, this solution gives `-1` as a the result. But in C (c99/c11), `(-2 + 1) / 2` is `0`. This solution has the same problem as the one from Vlad; c99/c11 division truncates towards `0` and not towards `-inf`. – ouah Jul 24 '14 at 16:27
4

A simple approach is the following

int c = a / 2 + ( b + a % 2 ) / 2;

For example a and b can be represented as

a = 2 * n + r1;
b = 2 * m + r2;

Then

( a + b ) / 2 => ( 2 * n + r1 + 2 * m + r2 ) / 2 => 2 * n / 2 + ( b + r1 ) / 2

And the last expression gives you

=> a / 2 + ( b + a % 2 ) / 2

The more correct expression is the following

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

For example if we have

int a = INT_MAX;
int b = INT_MAX;

then c calculated as

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

will give c == INT_MAX

EDIT: there was found interesting difference between the effect of computer operators and the effect of mathematical operators. For example according to the mathematics -1 can be represented as

-1 = -1 * 2 + 1 

that is according to the formula

a = 2 * n + r1

2 * n shall be an integer number less than or equal tp a

So the number that is less -1 is -2. :)

I think that the general formula shown by me would work it is required that for odd negative numbers there would be considered even negative numbers that less than the odd negative number.

it seems that the correct formula looks as

int c = ( a < 0 ? a & ~1 : a ) / 2 + 
        ( b < 0 ? b & ~1 : b ) / 2 + 
        ( ( a & 1 ) + ( b & 1 ) ) / 2;

It is important to note that from the mathematical point of view the average of -1 and -2 shall be equal to -2 and the formula gives the correct result.:)

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Another quick question - how did you get modulus (`a % 2`) in the final expression? Where did that come from? – bodacydo Jul 23 '14 at 21:02
  • 1
    @@bodacydo As i presented a as 2 * n + r1 then r1 is a % 2. r1 is either 0 or 1. – Vlad from Moscow Jul 23 '14 at 21:06
  • 2
    So, what if `a == b == INT_MAX`? `INT_MAX % 2` is typically nonzero. The "simple approach" will still produce overflows. – dyp Jul 23 '14 at 21:20
  • @dyp What is the problem? Try a /2 + b / 2 + ( a % 2 + b % 2 ) / 2 – Vlad from Moscow Jul 23 '14 at 21:27
  • @dyp Have you gotten INT_MAX? :) – Vlad from Moscow Jul 23 '14 at 21:28
  • In the first formula, if `b == INT_MAX` and `a == 1`, the sub-expression `b + a % 2` becomes `INT_MAX + 1`. That doesn't happen in the "more correct expression". *"Have you gotten `INT_MAX`"* ??? – dyp Jul 23 '14 at 21:30
  • @dyp Please reread my comment above and my original post. They are clear enough. – Vlad from Moscow Jul 23 '14 at 21:34
  • 1
    As I said, the "more correct expression" is **not** affected by this problem. However, the first expression is: `int c = a / 2 + ( b + a % 2 ) / 2;` is affected by this problem. -- I don't understand why you post that first approach. – dyp Jul 23 '14 at 21:35
  • Is that first approach meant as a mathematical equation rather than a C statement? Or something like an ansatz? – dyp Jul 23 '14 at 21:36
  • @Vlad: It's pretty, but does it handle the case where `a` is negative? In that case `a%2` can be -1 (same for b)? – Mike Dunlavey Jul 23 '14 at 21:37
  • @Mike Dunlavey I think the result will be correct. – Vlad from Moscow Jul 23 '14 at 21:40
  • 2
    With `a = 2` and `b = -1`, your method gives `1`, but `(2 - 1) / 2` is `0`. – ouah Jul 23 '14 at 21:49
  • @ouah Right. The problem is `( 2 * n + r1 + 2 * m + r2 ) / 2 => 2 * n / 2 + ( b + r1 ) / 2` is wrong exactly for these two values. – dyp Jul 23 '14 at 21:59
  • @ouah From the mathematical point of view the remainder can be either 0 or 1 that is it shall be less than 2 and greater than or equal to 0. So it seems it is the case when mathematics and computer calculations do not coincide.:) – Vlad from Moscow Jul 23 '14 at 22:06
  • @VladfromMoscow agree but OP probably wants the same results as `(a + b) / 2` when no overflow occurs. – ouah Jul 23 '14 at 22:13
  • @VladfromMoscow your last expression with `a = -2` and `b = 1` gives `-1` instead of `0`. – ouah Jul 23 '14 at 22:31
  • @ouah It seems that -1 is the correct mathematical result. I described this case in my post. – Vlad from Moscow Jul 23 '14 at 22:34
  • @VladfromMoscow The representation via `a = 2 * n + r1` is not unique: `-1 = -1 * 2 + 1` but also `-1 = 0 * 2 - 1`. – dyp Jul 23 '14 at 22:44
  • @VladfromMoscow c99 does truncation towards `0`, it means `(-2 + 1) / 2` is `0` and not `-1` in C. – ouah Jul 23 '14 at 22:48
  • @dyp According to the mathematics in the representation of any integer number x = 2 * n + r, r shall be less than 2 and greater than or equal to 0. – Vlad from Moscow Jul 23 '14 at 22:48
  • @ouah I already wrote about this that there is a difference between the mathematics and computer operations.:) – Vlad from Moscow Jul 23 '14 at 22:51
  • @ouah For example in mathematics the average between -1 and -2 can not be equal to -1 because the acerage is always less than the maximum value. It can be equal to the maximum value only in the case when the two operands are equal. – Vlad from Moscow Jul 23 '14 at 22:55
  • @VladfromMoscow there is no difference for me between math and computer result. As `dyp` mentioned `%` has two solutions and c99 constrained `%` to have one. – ouah Jul 23 '14 at 22:57
  • The x&~1 terms in your third solution depend on twos-complement representation, which is all but universal today but not actually required by the C standard (in any version; C89 allowed ones-comp and sign-mag implicitly, C99 and C03 do so explicitly). Another (I think simpler) way to floor the result is to bias the known-small correction term: `a/2 + b/2 + ((a%2 + b%2 + 2)/2-1)`. – dave_thompson_085 Jul 24 '14 at 01:51
4

If you are concerned about overflow, you could cast the values to a larger type to perform the math, and then do the bounds checking.

user3100381
  • 464
  • 2
  • 5
3

This is from Calculating the average of two integer numbers rounded towards zero in a single instruction cycle:

(a >> 1) + (b >> 1) + (a & b & 0x1)

You must consider that:

  • it's implementation defined whether right shifting a negative integer shifts zeros or ones into the high order bits. Many CPUs often have two different instructions: an arithmetic shift right (preserves the sign bit) and a logical shift right (doesn't preserve the sign bit). The compiler is allowed to choose either (most compilers choose an arithmetic shift instruction).

    ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators

    ¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. [CUT] If E1 has a signed type and a negative value, the resulting value is implementation-defined.

    Changing the expression to:

    a / 2 + b / 2 + (a & b & 0x1)
    

    isn't a solution since logical right shifts are equivalent to division by a power of 2 only for positive or unsigned numbers.

  • also (a & b & 0x1) isn't well defined. This term should be non-zero when both a and b are odd. But it fails with one's complement representation and ISO C, section 6.2.6.2/2, states that an implementation can choose one of three different representations for integral data types:

    • two's complement
    • one's complement
    • sign/magnitude

    (usually the two's complement far outweigh the others).

Community
  • 1
  • 1
manlio
  • 18,345
  • 14
  • 76
  • 126
  • Note `(a >> 1) + (b >> 1) + (a & b & 0x1)` has a different (by 1) answer than `(a+b)/2` about 1/4 of the time using various `int` on a typical 2's complement machine. IOWs, the answer sometimes rounds away from 0. The 2nd method `a / 2 + b / 2 + (a & b & 0x1)` often fails. example `-3, -3`. – chux - Reinstate Monica Apr 24 '15 at 19:06
  • @chux Thank you for your observations, I've changed part of the answer. – manlio Apr 28 '15 at 08:10
  • Although ISO C allows implementations to use any of those integer formats, it also requires support for a data type (an unsigned long long of at least 64 bits) which would be impractical on any platform that cannot efficiently handle two's-complement math unless its native integer type was 65 bits or longer. So far as I can tell, there has never been, and never will be, any production implementation of C99 or any future version of the language, which does not use two's-complement math. – supercat May 29 '18 at 17:23
  • @chux: I think `(a>>1)+(b>>1)+(a & b & 1)` will have consistent round-down (as opposed to round-smallest). Characterizing the answer that way seems more useful than saying its behavior differs about 1/4 of the time. While some applications might want round-smallest behavior, for many it's more important that (absent overflow) avg(x,y)+z == avg(x+z,y+z). If avg(-1,0)==-1 and avg(0,1)==0, that equality will hold. If avg(-1,0) and avg(0,1) are both zero, however, it won't. – supercat May 29 '18 at 17:35
2

The simplest (and usually fastest) way to average two int over the entire range [INT_MIN...INT_MAX] is to resort to a wider integer type. (Suggested by @user3100381.) Let us call that int2x.

int average_int(int a, int b) {
  return ((int2x) a + b)/2;
}

Of course this obliges a wider type to exist - so let's look at a solution that does not require a wider type.

Challenges:

Q: When one int is odd and the other is even, which way should rounding occur?
A: Follow average_int() above and round toward 0. (truncate).

Q: Can code use %?
A: With pre-C99 code, the result of a % 2 allows for different results when a < 0. So let us not use %.

Q: Does int need to have about symmetric range of positive and negative numbers?
A: Since C99 the number of negative numbers is the the same (or 1 more) than the number of positive numbers. Let us try not to require this.

SOLUTION:

Perform tests to determine is if overflow may occur. If not, simple use (a + b) / 2. Otherwise, add half the difference (signed same as answer) to the smaller value.

The following gives the same answer as average_int() without resorting to a wider integer type. It protects against int overflow and does not require INT_MIN + INT_MAX to be 0 or -1. It does not depends on encoding to be 2's complement, 1's complement or sign-magnitude.

int avgC2(int a, int b) {
  if (a >= 0) {
    if (b > (INT_MAX - a)) {
      // (a+b) > INT_MAX
      if (a >= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  } else {
    if (b < (INT_MIN - a)) {
      // (a+b) < INT_MIN
      if (a <= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  }
  return (a + b) / 2;
}

At most, 3 if()s occur with any pair of int.

Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

If you only have to deal with unsigned integer types (and can think in binary), you can decompose your addition into digit and carry. We can write a+b (in unlimited precision) as (a^b) + ((a&b)<<1)), so (a+b)/2 is simply ((a^b)>>1) + (a&b). This last expression fits within the common type of a and b, so you can use it in your code:

unsigned semisum(unsigned a, unsigned b)
{
    return ((a^b)>>1) + (a&b);
}
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
-1

Simplest answer if there is only 2 elements to avoid overflow would be:

(a/2) + (b/2) = average

For more elements, you could use:

(a/x) + (b/x) + (c/x) + (d/x) ..... = average //x = amount of elements

From a mathematical point of view, this will never reach an overflow if none of the original values has done so previously, as you do not really add them all together, but rather dividing them before adding them together. Thus no result of any operation performed during the calculation, including the result, will ever be larger (to either side of 0) than the largest initial element (assuming you only work with Real Numbers).

So do the following:

  1. Determine the amount of elements in 'C', let's call it total.
  2. Declare a value to store the average, let's call it average.
  3. Declare a value to store the remainders, let's call it remainder.
  4. Iterate through them and:
    • Divide the current element by the total amount, total.
    • Add the result to the average, average.
    • Add the remainder of the divided values together, remainder.
  5. Divide the remainders as well and add it to the average, average.
  6. Do with the average what you need/intend to.

This will give you an answer off by a maximum of 1 (Decimal numeric system [base 10]). I don't know C++ yet, so I can only give you an example in C#.

Pseudo code in C# (just to provide an idea):

int[] C = new int[20];            //The array of elements.
int total = C.Length;             //The total amount of elements.
int average = 0;                  //The variable to contain the result.
int remainder = 0;                //The variable to contain all the smaller bits.
foreach (int element in C)        //Iteration
{
    int temp = (element / total); //Divide the current element by the total.
    average = average + temp;     //Add the result to the average.
    temp = (temp % total);        //Get the remainder (number that was not divided)
    remainder = remainder + temp; //Add remainder to the other remainders.
}
average = average + (remainder / total); // Adds the divided remainders to the average.

Compressed C# example:

int[] C = new int[20];               //The array of elements.
int total = C.Length;                //The total amount of elements.
int average = 0;                     //The variable to contain the result.
int remainder = 0;                   //The variable to contain all the smaller bits.
foreach (int element in C)           //Iteration
{
    average += (element / total);    //Add the current element divided by the total to the average.
    remainder += ( element % total); //Add the remainders together.
}
average += (remainder / total); //Adds the divided remainders to the total.
EternalWulf
  • 742
  • 9
  • 21
  • 2
    But (see Jonathan Leffler's comment): `(1 + 3)/2 = 2` whereas `1/2 + 3/2 = 1` – manlio Jul 24 '14 at 09:18
  • A weakness to this approach occurs when all `n` `int` have the values `value%n == n-1`. In that case, `remainder` becomes about `n*n`. So this method has trouble when `n > sqrt(INT_MAX) > sqrt(32767)`. To avoid this limitation, code could maybe use `remainder = remainder + temp; average += remainder/total; remainder %= total.` - but that seems expensive and not needed when `n` is small. – chux - Reinstate Monica Apr 24 '15 at 18:50