Overflow When Calculating Average?

Question

Given 2 integer numbers we can calculate their average like this:

return (a+b)/2;

which isn't safe since (a+b) can cause overflow (Side Note: can someone tell me the correct term for this case maybe memory overflow?)

So we write:

return a+(b-a)/2;

can the same trick be implemented over n numbers and how?

https://stackoverflow.com/a/1934266/14215102 <- here's a trick for N numbers — , Jun 19 '21 at 21:41
@dratenik However, it is important to pay attention to the comment: `this algorithm will have severe underflow issues` — eerorika, Jun 19 '21 at 21:43
@dratenik With integers, you have the same overflow problem as the two valued examples in the question. — eerorika, Jun 19 '21 at 21:55

eerorika · Answer 1 · 2021-06-19T22:14:27.387

3

Note that there are several different averages. I assume that you're asking about the arithmetic mean.

overflow (Side Note: can someone tell me the correct term for this case maybe memory overflow?)

The correct term is arithmetic overflow, or just overflow. Not memory overflow.

a+(b-a)/2;

b-a can also overflow. This isn't quite as easy to solve as it may seem.

Standard library has a function template to do this correctly without overflow: std::midpoint.

I checked an implementation of std::midpoint, and they do what you suggested for integers, except the operands are first converted to the corresponding unsigned type. Then the result is converted back. A mathematician may explain how that works, but I guess that it has something to do with the magic of modular arithmetic.

For floats, they do a / 2 + b / 2 (if the inputs are normal).

can the same trick be implemented over n numbers and how?

Simplest solution that works with all inputs without overflow and without imprecision is probably to use arbitrary precision arithmetic.

edited Jun 19 '21 at 22:14

answered Jun 19 '21 at 21:36

eerorika

232,697
12
197
326

`std::midpoint` doesn't work for more than 2 arguments though right? – mattlangford Jun 19 '21 at 21:37
@mattlangford `midpoint` works for only 2 arguments – Ranoiaetep Jun 19 '21 at 21:40
So `a / 2 + b / 2` can't overflow for integers? – Jun 19 '21 at 22:08
1

@john It can't overflow, but you'll get a wrong result for many values if you're looking for arithmetic mean. Consider for example `1 / 2 + 1 / 2 == 0`. Is 0 the average of array `[1, 1]`? – eerorika Jun 19 '21 at 22:10
This here is a solution res = a/2.0 + b/2.0 @eerorika – Jun 19 '21 at 22:15
@john doubles cannot precisely represent all integers. That would likely be imprecise if the inputs are 64 bit integers with large absolute values (Edit: actually it's worse: behaviour of converting unrepresentable integer to float is undefined). Also, there is no simple solution for precise average of many doubles (see the discussion about underflow in the comments under the question). – eerorika Jun 19 '21 at 22:19
can you give an example @eerorika – Jun 19 '21 at 22:31
@eerorika according to [conv.fpint] integer->floating may be imprecise, no UB; now FP->int is another matter – Jun 19 '21 at 22:50
@dratenik That's not how I would interpret `[conv.fpint] A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. ... If the value being converted is outside the range of values that can be represented, the behavior is undefined.` – eerorika Jun 19 '21 at 22:52
@dratenik Although, I suppose that there might not be integer values that are outside the representable range (assuming IEEE-754). – eerorika Jun 19 '21 at 22:55
@eerorika sure, if you have int128 and float32, you might hit that one, not with int64 and double – Jun 19 '21 at 22:57

Ranoiaetep · Answer 2 · 2021-06-24T06:07:57.113

One way of getting average number for multiple numbers is to find the Cumulative Moving Average, or CMA:

$\mathit{CMA_{n+1}}=\mathit{CMA_{n}}+\frac{x_{n+1}-\mathit{CMA_{n}}}{n+1}$

Your code a + (b - a) / 2 can also be derived from this equation for n + 1 == 2.

Translating above equation to code, you would get something similar to:

std::vector<int> vec{10, 5, 8, 3, 2, 8}; // average is 6

double average = 0.0;

for(auto n = 0; n < vec.size(); ++n)
{
    average += (vec[n] - average) / (n + 1);
}

std::cout << average; // prints 6

Alternatively, you can also use the std::accumulate:

std::cout << std::accumulate(vec.begin(), vec.end(), 0.0, 
                             [n = 0](auto cma, auto i) mutable {
                                 return cma + (i - cma) / ++n;
                             });

Do note any time you are using floating division can result into imprecise result, especially when you attempt to do that for numerous times. For more regarding impreciseness, you can look at: Is floating point math broken?

I think you should explain how CMA is initialised, and the type of CMA -- if it is integral the above won't always work. Indeed a snippet of code that could be compiled woud be best. — dmuir, Jun 20 '21 at 08:12

Overflow When Calculating Average?

2 Answers2