I have some problems with rounding errors in c++. If I have to compute the average of two floats a
and b
, then why is it better to do a+0.5*(b-a)
than (a+b)/2
? I can't understand why there should be any difference in the two ways of computing it.

- 29,088
- 9
- 83
- 120

- 43
- 3
-
1Please show a [MCVE] as well as the expected and actual output. – Jabberwocky Feb 12 '18 at 05:50
-
2Floating point numbers (i.e. `float` and `double`) are not like mathematical Real numbers (at best similar in certain limited ranges). To understand this better, these links may help: [SO: Floating point inaccuracy examples](https://stackoverflow.com/q/2100490/7478597), [SO: C++ floating point precision](https://stackoverflow.com/q/2909427/7478597). – Scheff's Cat Feb 12 '18 at 05:51
-
1Sometime that is true, sometime not – Yola Feb 12 '18 at 06:36
2 Answers
[Disclaimer: this answer assumes IEEE 754 format and semantics. Specifically, we assume that float
is the IEEE 754 binary32 format, that we're using the default round-ties-to-even rounding mode, and that intermediate expressions are not computed with extended precision - e.g., because FLT_EVAL_METHOD
is 0
.]
Here's one possible reason to prefer a + 0.5 * (b-a)
: if a
and b
are very large and have the same sign, then the intermediate quantity a + b
in the expression 0.5 * (a + b)
might overflow, giving either an infinite result or a floating-point exception. In contrast, a + 0.5 * (b - a)
will not overflow in that situation.
However, that small advantage should be weighed against the following:
a + 0.5 * (b - a)
requires three floating-point operations;0.5 * (a + b)
requires only two.- in cases where
a + b
does not overflow,0.5 * (a + b)
always provides a correctly-rounded answer: that is, it gives the best possible approximation to the actual mean, given the representability constraints of the target type. (That's not entirely obvious, but not hard to prove: eithera + b
is greater in magnitude than twice the smallest normal, in which case the sum is correctly rounded and the multiplication by0.5
is exact, ora + b
is itself computed exactly, and then the multiplication by0.5
is correctly rounded. Either way, at most one of the two arithmetic operations can introduce an error.) Buta + 0.5 * (b - a)
will not always give a correctly-rounded mean, and in fact can be many millions of ulps in error. Consider the case wherea = -1.0
andb = 1.0 + 2^-23
. Thena + 0.5 * (b - a)
gives0.0
. The correct mean is2^-24
. - the expression
a + 0.5 * (b - a)
can also overflow, ifa
andb
are very large with opposite signs rather than the same sign. In that situation,0.5 * (a + b)
will not overflow. a + 0.5 * (b - a)
is (very slightly) less readable than0.5 * (a + b)
; it requires a moment's more thought on the part of the reader to figure out what it's doing.
Given the above, it's hard to support a general recommendation that a + 0.5 * (b - a)
should be used in preference to 0.5 * (a + b)
.

- 29,088
- 9
- 83
- 120
Your formula is correct in case if you are computing average of many numbers. In this case you can do the following:
μn = 1/nΣxi
but here while adding 101st number you will need to add x101 to μ100, where μ100 could be pretty big compared to x101, and so you will lose some precision. To avoid this problem you can to like this:
μ101 = μ100 + 1/n(x101 - μ100)
This formula is much better if you xi is of the same order of magnitude, as you avoid dealing with arithmetic operations between two big number and xi.
You might want to read article Numerically stable computation of arithmetic means
Let's look into how numbers are represented in IEEE floating point. Consider C++ float
:
Interval [1,2] goes with step 2-23, so you can represent numbers 1+n*2-23, where n belongs to {0, ..., 223}.
Interval [2j, 2j+1] goes with like [1,2] but multiplied by 2j.
To, see how precision is lost you can run this program:
#include <iostream>
#include <iomanip>
int main() {
float d = pow(2,-23);
std::cout << d << std::endl;
std::cout << std::setprecision(8) << d + 1 << std::endl;
std::cout << std::setprecision(8) << d + 2 << std::endl; // the precision has been lost
system("pause");
}
The output is
1.19209e-07
1.0000001
2

- 18,496
- 11
- 65
- 106