Rounding error in computing average

Question

I have some problems with rounding errors in c++. If I have to compute the average of two floats a and b, then why is it better to do a+0.5*(b-a) than (a+b)/2? I can't understand why there should be any difference in the two ways of computing it.

Please show a [MCVE] as well as the expected and actual output. — Jabberwocky, Feb 12 '18 at 05:50
Floating point numbers (i.e. `float` and `double`) are not like mathematical Real numbers (at best similar in certain limited ranges). To understand this better, these links may help: [SO: Floating point inaccuracy examples](https://stackoverflow.com/q/2100490/7478597), [SO: C++ floating point precision](https://stackoverflow.com/q/2909427/7478597). — Scheff's Cat, Feb 12 '18 at 05:51

Mark Dickinson · Answer 1 · 2018-02-12T20:56:03.127

[Disclaimer: this answer assumes IEEE 754 format and semantics. Specifically, we assume that float is the IEEE 754 binary32 format, that we're using the default round-ties-to-even rounding mode, and that intermediate expressions are not computed with extended precision - e.g., because FLT_EVAL_METHOD is 0.]

Here's one possible reason to prefer a + 0.5 * (b-a): if a and b are very large and have the same sign, then the intermediate quantity a + b in the expression 0.5 * (a + b) might overflow, giving either an infinite result or a floating-point exception. In contrast, a + 0.5 * (b - a) will not overflow in that situation.

However, that small advantage should be weighed against the following:

a + 0.5 * (b - a) requires three floating-point operations; 0.5 * (a + b) requires only two.
in cases where a + b does not overflow, 0.5 * (a + b) always provides a correctly-rounded answer: that is, it gives the best possible approximation to the actual mean, given the representability constraints of the target type. (That's not entirely obvious, but not hard to prove: either a + b is greater in magnitude than twice the smallest normal, in which case the sum is correctly rounded and the multiplication by 0.5 is exact, or a + b is itself computed exactly, and then the multiplication by 0.5 is correctly rounded. Either way, at most one of the two arithmetic operations can introduce an error.) But a + 0.5 * (b - a) will not always give a correctly-rounded mean, and in fact can be many millions of ulps in error. Consider the case where a = -1.0 and b = 1.0 + 2^-23. Then a + 0.5 * (b - a) gives 0.0. The correct mean is 2^-24.
the expression a + 0.5 * (b - a) can also overflow, if a and b are very large with opposite signs rather than the same sign. In that situation, 0.5 * (a + b) will not overflow.
a + 0.5 * (b - a) is (very slightly) less readable than 0.5 * (a + b); it requires a moment's more thought on the part of the reader to figure out what it's doing.

Given the above, it's hard to support a general recommendation that a + 0.5 * (b - a) should be used in preference to 0.5 * (a + b).

Yola · Accepted Answer · 2018-02-12T07:36:11.690

Your formula is correct in case if you are computing average of many numbers. In this case you can do the following:

μ_n = 1/nΣx_i

but here while adding 101st number you will need to add x₁₀₁ to μ₁₀₀, where μ₁₀₀ could be pretty big compared to x₁₀₁, and so you will lose some precision. To avoid this problem you can to like this:

μ₁₀₁ = μ₁₀₀ + 1/n(x₁₀₁ - μ₁₀₀)

This formula is much better if you x_i is of the same order of magnitude, as you avoid dealing with arithmetic operations between two big number and x_i.

You might want to read article Numerically stable computation of arithmetic means

Let's look into how numbers are represented in IEEE floating point. Consider C++ float:

Interval [1,2] goes with step 2^-23, so you can represent numbers 1+n*2^-23, where n belongs to {0, ..., 2²³}.

Interval [2^j, 2^j+1] goes with like [1,2] but multiplied by 2^j.

To, see how precision is lost you can run this program:

#include <iostream>
#include <iomanip>
int main() {
    float d = pow(2,-23);
    std::cout << d << std::endl;
    std::cout << std::setprecision(8) << d + 1 << std::endl;
    std::cout << std::setprecision(8) << d + 2 << std::endl; // the precision has been lost
    system("pause");
}

The output is

1.19209e-07
1.0000001
2

Rounding error in computing average

2 Answers2