14

Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?

I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.

The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?

What if we know if one is larger than the other a priori?

Thanks.

D


Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
McCormick
  • 169
  • 1
  • 5
  • Check the assembly language that the compiler generates with different equations, as the compiler may find an optimization for a regular old sum divided by two (with appropriate casts) calculation. – jonsca Apr 18 '11 at 00:44
  • That would be useful if the compiler was not within its own right to produce assembly that only makes sense for a certain platform. Think about cases where signed numbers are not even 2-complement! With this in mind, the compiler output is not too useful to use... – McCormick Apr 18 '11 at 00:53
  • 1
    I am kinda linking the following: (a / 2) + (b / 2) + ((a % 2) + (b % 2)) / 2 What is nice is that it exactly complements the definition of modulo and thus is perfectly accurate in a mathematical sense.... But is it accurate in a _C_ sense? – McCormick Apr 18 '11 at 00:54
  • 1
    @jonsca: The compiler can ignore overflow since it's UB, and therefore will not give you a solution. – R.. GitHub STOP HELPING ICE Apr 18 '11 at 00:54
  • I believe the rounding of a/2 and b/2 (and the sign of a%2) is machine-specific if they are negative, this may influence what is a suitable answer. What the standard provides is "If the quotient `a/b` is representable, the expression `(a/b)*b + a%b` shall equal `a`.", so they'll at least be consistent, which means if you use the mathematical-based one without any !! tricks it _should_ be right (but it may round one way or the other if a negative number is involved) – Random832 Apr 18 '11 at 02:17
  • 1
    @Random832: It was implementation-specific 12 years ago. C99 imposed the requirement that it behave contrary to every mathematician's intuition and "round" towards zero. – R.. GitHub STOP HELPING ICE Apr 18 '11 at 02:36

7 Answers7

9

After accept answer (4 yr)

I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.

When int2x exists, @Santiago Alessandri approach works well.

int avgSS(int a, int b) {
  return (int) ( ((int2x) a + b) / 2);
}

Otherwise a variation on @AProgrammer:
Note: wider math is not needed.

int avgC(int a, int b) {
  if ((a < 0) == (b < 0)) {  // a,b same sign
    return a/2 + b/2 + (a%2 + b%2)/2;
  }
  return (a+b)/2;
}

A solution with more tests, but without %

All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.


@Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.

((long long)a + (long long)b) / 2

@AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2

a/2 + b/2 + (a%2 + b%2)/2

@Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0

int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;

@R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1

return (a-(a|b)+b)/2+(a|b)/2;

@MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2

unsigned diff;
signed mean;
if (a > b) {
    diff = a - b;
    mean = b + (diff >> 1);
} else {
    diff = b - a;
    mean = a + (diff >> 1);
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
5

If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.

Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.

I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...

Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.

Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
3

Edit: version fixed by @chux - Reinstate Monica:

if ((a < 0) == (b < 0)) {  // a,b same sign
  return a/2 + b/2 + (a%2 + b%2)/2;
} else {
  return (a+b)/2;
}

Original answer (I'd have deleted it if it hadn't been accepted).

a/2 + b/2 + (a%2 + b%2)/2

Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).

It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • 2
    Wow, the code gcc generates for this is utterly horrible. With `-Os` it uses `idiv` for everything, and with `-O2` or higher it switches to bitwise ops, but uses a *damn lot* of them... If you have a larger type available, I think you should just do the averaging in the larger type... – R.. GitHub STOP HELPING ICE Apr 19 '11 at 13:00
  • 1
    @R.., the horrible bitwise handling is needed to ensure truncation toward 0 for negative values. – AProgrammer Apr 19 '11 at 13:09
  • Yes but the whole thing could (and should) be simplified. I'm pretty sure gcc is just using its standard idiom for handling negative division and failing to simplify common subexpressions and whatnot afterwards. – R.. GitHub STOP HELPING ICE Apr 19 '11 at 13:20
  • 2
    a = -6 b= 3 then your expression gives -2 but the correct value should be -1. What am I missing here? – zjk Oct 23 '15 at 09:48
  • 5
    This is not the correct answer, please look into @zjk comment. I have verified it using a SAT solver. It looks like it fails for a negative even integer and a positive odd integer – Nishant Mar 03 '16 at 06:18
2

For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.

You can read up more at Hacker's Delight in section 2.5

The code to calculate average of 2 signed integers without overflow is

int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )

I have checked it's correctness using Z3 SMT solver

Nishant
  • 2,571
  • 1
  • 17
  • 29
1

Just a few observations that may help:

"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):

(a+b)/2 = a+(b-a)/2 = b+(a-b)/2

Let's try breaking this down:

  • If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
  • If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.

What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).

I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:

int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
Guy Sirton
  • 8,331
  • 2
  • 26
  • 36
  • All 3 possible signed integer representations involve a "sign bit", and you can test if `a` and `b` have opposite sign simply by checking that `a^b` is less that zero (since the sign bits will cancel each other out in the bit representation if both are negative). – R.. GitHub STOP HELPING ICE Apr 18 '11 at 19:15
-1

I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:

((long long)a + (long long)b) / 2

If you want the decimal part, store it as a double.

It is important to note that the result will fit in a 32 bit integer.

If you are using the highest-rank integer, then you can use:

((double)a + (double)b) / 2
Santiago Alessandri
  • 6,630
  • 30
  • 46
  • 2
    What if `a` and `b` are already the highest-rank signed integer type? – R.. GitHub STOP HELPING ICE Apr 18 '11 at 00:56
  • In that case cast to doubles a and b before doing the division. – Santiago Alessandri Apr 18 '11 at 01:04
  • Note that in C the result of `>>` when the left operand is negative is implementation-defined. In other words, depending on the compiler and platform it may or may not actually sign-extend, and may in fact even refuse to compile or crash the program (as long as the implementation defines this behavior). – Anomie Apr 18 '11 at 01:10
  • No, it has an implementation-defined *value*, not behavior. It's safe, but the value might just be strange. :-) – R.. GitHub STOP HELPING ICE Apr 18 '11 at 01:15
  • 3
    Any sane optimizer would optimize division by two to a shift anyways, and would obviously not do it when it doesn't work on the target platform. So best leave it as division, it's also easier to read. – SoapBox Apr 18 '11 at 01:26
  • @SoapBox: Only for an unsigned type! (Which, in this case, it is...) – Arafangion Apr 18 '11 at 07:23
  • 4
    @SanSS casting to double will lose more precision than a/2 + b/2 for 64 bits integers. – AProgrammer Apr 18 '11 at 07:45
-2

This answer fits to any number of integers:

    int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    decimal avg = 0;
    for (int i = 0; i < array.Length; i++){
        avg = (array[i] - avg) / (i+1) + avg;
    }

expects avg == 5.0 for this test