45

Using integer math alone, I'd like to "safely" average two unsigned ints in C++.

What I mean by "safely" is avoiding overflows (and anything else that can be thought of).

For instance, averaging 200 and 5000 is easy:

unsigned int a = 200;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2600 as intended

But in the case of 4294967295 and 5000 then:

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2499 instead of 2147486147

The best I've come up with is:

unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a / 2) + (b / 2); // Equals: 2147486147 as expected

Are there better ways?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Tim
  • 997
  • 2
  • 11
  • 17
  • Can't you cast the sum to `long long`? – IVlad Sep 28 '10 at 19:47
  • 8
    The third option will give the wrong answer if both a and b are odd (since it will round down both halves). – Jackson Pope Sep 28 '10 at 19:48
  • 23
    US patent number 6,007,232. Calculating the average of two integer numbers rounded towards zero in a single instruction cycle: http://www.google.com/patents?id=eAIYAAAAEBAJ&dq=6007232 essentially uses `return (a >> 1) + (b >> 1) + (a & b & 0x1);` – Arun Sep 28 '10 at 20:01
  • 10
    ...wow. I'm saving that link for the next time someone complains about software patents. – Stephen Canon Sep 28 '10 at 20:23
  • 7
    it's interesting how many of the answers below contain this patented solution. I'm sure most/all of them developed it independently, perhaps even on the spot for their answer. That would seem to indicate the patent doesn't meet the standard of non-obviousness. – rmeador Sep 28 '10 at 21:28
  • 4
    this is a hardware patent (notice that the result is produced in one clock cycle) – pm100 Sep 28 '10 at 21:31
  • 1
    @pm100: I'm not sure that's a true distinction. The code @ArunSaha wrote will make the CPU become the circuit described in the patent. It may even work in one instruction cycle on a modern x86, but I'm not certain. Regardless, that C++ code could be trivially changed into VHDL code, and then it _is_ hardware... – rmeador Sep 28 '10 at 21:51
  • @Martin York: Are you saying ops answer doesnt work? he knows. If your talking about ArunSaha comment or sellibitze answer then you forgot the `+ (a & b & 0x1)` part. –  Sep 29 '10 at 05:38

10 Answers10

56

Your last approach seems promising. You can improve on that by manually considering the lowest bits of a and b:

unsigned int average = (a / 2) + (b / 2) + (a & b & 1);

This gives the correct results in case both a and b are odd.

Jay Bazuzi
  • 45,157
  • 15
  • 111
  • 168
sellibitze
  • 27,611
  • 3
  • 75
  • 95
  • Awesome, this is exactly the kind of consideration I was looking for. – Tim Sep 28 '10 at 19:59
  • 2
    Speaking of software patents it appears that patent application: 20090249356 is trying to patent what is well known folklore in the computer industry. CAS-less single producer single consumer circular queues have been known for almost 30 years. (I wrote my first one in the early 80's) I wrote to complain but they said it was too late. I think the patent office should be inundated with "technical hate emails" on this one. – nbourbaki Sep 29 '10 at 02:41
  • 5
    There's a slight problem with using this one: Samsung has a patent for it. http://www.google.com/patents?id=eAIYAAAAEBAJ&dq=6007232 – George Oct 04 '10 at 11:08
  • Only works for positive integers as the last part ignores the sign bit. – crazypeter Feb 23 '19 at 02:58
29

If you know ahead of time which one is higher, a very efficient way is possible. Otherwise you're better off using one of the other strategies, instead of conditionally swapping to use this.

unsigned int average = low + ((high - low) / 2);

Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sheldon L. Cooper
  • 3,230
  • 19
  • 13
  • i like this, but what if there's an error due to integer division? – ianmac45 Sep 28 '10 at 19:49
  • Why would there be? You're never dividing by 0, which is the only integer division that'd produce an error. – cHao Sep 28 '10 at 19:51
  • 3
    This is the classic answer to this problem, especially when you already know which value is high and which is low - choosing a midpoint, for example. – Mark Ransom Sep 28 '10 at 19:56
  • ordering them would be too expensive – ruslik Sep 28 '10 at 20:00
  • 2
    @ruslik: unless you know the ordering *a priori*, as in the linked article (which is probably the single most common use case for integer averaging). – Stephen Canon Sep 28 '10 at 20:19
  • @Stephen not really. while technicaly it's a bug, it's almost impossible to actually happen in a binary search. I'm sure that in this particular case it was quite common for sum to overflow. – ruslik Sep 28 '10 at 20:38
  • @ruslik, not really. Maybe it was almost impossible ten years ago, but not nowadays. – Sheldon L. Cooper Sep 28 '10 at 22:13
  • 3
    @ArunSaha: wrong! the original problem was about overflow. in this case you allow `high - low` to be signed, so this can easily overlow in the same way as in the original problem. you can avoid it only by considering this difference unsigned, so you have to know which one is larger. – ruslik Sep 29 '10 at 01:42
  • @Sheldon: wrong again :) on most machines the default size of `int` is the same as the size of pointer, so you need a special machine for that kind of overflow, with huge address space and small ints. – ruslik Sep 29 '10 at 01:44
  • @ruslik: no, you were referring to Java code, in which the size of `int` will still be 32 bits. Please read the article carefully before making strong remarks about it. – Sheldon L. Cooper Sep 29 '10 at 04:32
  • This is *very* efficient if you already know which one is higher, fewer / faster asm instructions than the top answer. Especially on non-ARM, where right shifts cost aren't free as part of other instructions. https://godbolt.org/g/bSZHdE has asm output for x86 and ARM for both versions. – Peter Cordes Apr 10 '18 at 23:24
18

Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.

Try this:

average = (a>>1) + (b>>1) + (a & b & 1)

with math operators only:

average = a/2 + b/2 + (a%2) * (b%2)
iniju
  • 1,084
  • 7
  • 11
10

And the correct answer is...

(A&B)+((A^B)>>1)
mipe34
  • 5,596
  • 3
  • 26
  • 38
Jonathan Olson
  • 101
  • 1
  • 2
9

If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.

Of course, you usually should mind using inline-asm, because it defeats some optimizations (https://gcc.gnu.org/wiki/DontUseInlineAsm). But here we go anyway:

// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
    unsigned result;
    asm("add   %[x], %[res]\n\t"
        "rcr   %[res]"
        : [res] "=r" (result)   // output
        : [y] "%0"(y),  // input: in the same reg as results output.  Commutative with next operand
          [x] "rme"(x)  // input: reg, mem, or immediate
        :               // no clobbers.  ("cc" is implicit on x86)
    );
    return result;
}

The % modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.

As you can see on the Godbolt compiler explorer, this compiles correctly, and so does a version where we change the operands to unsigned long, with the same inline asm. clang3.9 makes a mess of it, though, and decides to use the "m" option for the "rme" constraint, so it stores to memory and uses a memory operand.


RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. It's great on AMD CPUs, where RCR has single-cycle latency. (Source: Agner Fog's instruction tables, see also the tag wiki for x86 performance links). It's still better than @sellibitze's version, but worse than @Sheldon's order-dependent version. (See code on Godbolt)

But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • +1: I have never written inline assembly :(, can you please comment and explain the three lines, specially how the values of `x` and `y` are picked up. – Arun Sep 28 '10 at 23:38
  • At the start of the inline assembly, there are four 4-byte values on the stack, starting at EBP: EBP+0 (the previous EBP, before the function call), EBP+4 (the previous instruction counter EIP), EBP+8 (x), and EBP+12 (y). The function is expected to place its result in EAX, so the assembly starts by moving x there. It then adds y, and an overflow from this operation will set the carry bit (lack of an overflow will clear the bit). RCR is a rotate-right-with-carry, which rotates EAX one bit to the right (dividing it by two) and shifts the carry bit into the most-sigificant bit of EAX. – Josh Townzen Sep 29 '10 at 06:11
  • 1
    Reference: http://www.cse.nd.edu/~dthain/courses/cse40243/fall2008/ia32-intro.html (under "Defining Functions"). Also, the calling convention used is `cdecl` (the default for C and non-member C++ functions), which you might want to look up if you want more information. – Josh Townzen Sep 29 '10 at 06:15
  • The addition leaves carry bit set when overflow occurs (and one one more bit is needed to hold the result). Then you rotate through carry right (making eax and carry flag effectively 33 bit register) which effectively divides by 2. Then you discard carry flag (which now contains original lowest significand bit from eax) and return eax as the result. Brilliant. – Tomek Sep 29 '10 at 06:54
  • 1
    @Josh @Tomek: There is no such thing as an *overflow* in unsigned arithmetic, it is called a *carry* (hence the name *carry flag*). – fredoverflow Sep 29 '10 at 11:38
  • 5
    This is not valid inline assembly because it does not code the operand dependency. A compiler might optimize it out or access bogus data when the function gets inlined. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 04:26
  • 1
    **You can't just write GNU C basic asm inside a function and leave a value in %eax**. As far as the the compiler is concerned, you've just written a function that reaches the end of a non-void function without returning a value. That fails as soon as you enable optimization, and maybe even before that. Always use extended-asm syntax with input and output operands. (See the [inline assembly tag wiki](http://stackoverflow.com/tags/inline-assembly/info)). And as R. says, of course all three asm instructions should be part of the same asm statement. – Peter Cordes Nov 30 '16 at 17:10
  • @PeterCordes Please feel free to improve my answer, you have my blessings! – fredoverflow Nov 30 '16 at 17:24
  • @RossRidge: xD, should have commented I was working on an edit. Mine was almost ready when you posted. And \@fredoverflow: there you go, a version that is about as free of suckage as possible for inline asm. I still wouldn't generally recommend it, though. It's usually best if compiler "understand" what's happening, so they can prove more about the values of variables. – Peter Cordes Nov 30 '16 at 18:04
  • @PeterCordes Well, overall I think your edit is better, but for reference my edit gives an example of how you can use the commutative modifier here. – Ross Ridge Nov 30 '16 at 18:06
  • @PeterCordes Also the output operand isn't early clobber, as all the input operands consumed at the start. (eg. `add %eax, %eax; rcr %eax` would be valid). – Ross Ridge Nov 30 '16 at 18:16
  • @RossRidge: good points. I tried your commutative idea, but it doesn't change the asm. Probably it can't do it because of the matching constraint with an output operand. Anyway, updated. – Peter Cordes Nov 30 '16 at 18:26
  • @PeterCordes Put a line like `x = foo();` before the asm statement, compile for 32-bits, and optimize with -O3 and you should see it using the `x` in already in EAX as the `[y]`/`[res]` operand. – Ross Ridge Nov 30 '16 at 18:40
  • @PeterCordes And apparently you need to use GCC 4.8 as well. Dunno why this broke in later compilers. – Ross Ridge Nov 30 '16 at 18:51
  • @RossRidge: ahhh, I was just going to ask you to take a look at https://godbolt.org/g/WPNxLB (gcc6.2 and clang3.9), since I'm sleepy and figured I was missing something. But yeah, gcc4.8.5 (https://godbolt.org/g/QJhyI6) does benefit from %. But even worse without it: two extra insns instead of just one. – Peter Cordes Nov 30 '16 at 18:58
5

What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that; fortunately, there's an easy fix:

unsigned int average = a/2 + b/2 + (a & b & 1);

This just bumps the average back up in the case that both divisions were truncated.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
3

In C++20, you can use std::midpoint:

template <class T>
constexpr T midpoint(T a, T b) noexcept;

The paper P0811R3 that introduced std::midpoint recommended this snippet (slightly adopted to work with C++11):

#include <type_traits>

template <typename Integer>
constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = std::make_unsigned<Integer>::type;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}

For completeness, here is the unmodified C++20 implementation from the paper:

constexpr Integer midpoint(Integer a, Integer b) noexcept {
  using U = make_unsigned_t<Integer>;
  return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
2

If the code is for an embedded micro, and if speed is critical, assembly language may be helpful. On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register. On an ARM, the average operation (source and dest. in registers) could be done in two instructions; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.

Incidentally, on machines with shorter word sizes, the differences can be even more substantial. On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions. Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).

supercat
  • 77,689
  • 9
  • 166
  • 211
  • A recent 8051 asm question ([Average of 2 registers, avoiding overflow in assembly?](https://stackoverflow.com/q/66551243)) shows this method. When I answered, I hadn't realized there were rotate-through-carry answers on this Q&A (including this or the x86 asm inline asm answer), or I would have left it closed as a duplicate. – Peter Cordes Mar 13 '21 at 02:02
-3
    int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    decimal avg = 0;
    for (int i = 0; i < array.Length; i++){
        avg = (array[i] - avg) / (i+1) + avg;
    }

expects avg == 5.0 for this test

-4

(((a&b << 1) + (a^b)) >> 1) is also a nice way.

Courtesy: http://www.ragestorm.net/blogs/?p=29

Peter O.
  • 32,158
  • 14
  • 82
  • 96
shubhros
  • 7
  • 1
  • 2
    This is wrong because there can be an overflow. Consider 8-bit ints and you want to find the average of 0xff and 0x01. It should be 0x80, right? Calculating: 0xff&0x01=0x01, 0x01<<1=0x02, 0xff^0x01=0xfe, 0x02+0xfe=0x00 (because ints are 8-bit, the 1 in 0x02+0xfe=0x100 is lost), 0x00>>1=0x00. 0x00!=0x80. – Alexey Frunze Mar 12 '12 at 07:58
  • 2
    This is just wrong, not because of overflow. It will compute that the average of 3 and 7 is 8. It should be `(a&b)+((a^b)>>1)`. – Joni Oct 03 '13 at 08:26