4

I have some problems with rounding errors in c++. If I have to compute the average of two floats a and b, then why is it better to do a+0.5*(b-a) than (a+b)/2? I can't understand why there should be any difference in the two ways of computing it.

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
Mars
  • 43
  • 3
  • 1
    Please show a [MCVE] as well as the expected and actual output. – Jabberwocky Feb 12 '18 at 05:50
  • 2
    Floating point numbers (i.e. `float` and `double`) are not like mathematical Real numbers (at best similar in certain limited ranges). To understand this better, these links may help: [SO: Floating point inaccuracy examples](https://stackoverflow.com/q/2100490/7478597), [SO: C++ floating point precision](https://stackoverflow.com/q/2909427/7478597). – Scheff's Cat Feb 12 '18 at 05:51
  • 1
    Sometime that is true, sometime not – Yola Feb 12 '18 at 06:36

2 Answers2

3

[Disclaimer: this answer assumes IEEE 754 format and semantics. Specifically, we assume that float is the IEEE 754 binary32 format, that we're using the default round-ties-to-even rounding mode, and that intermediate expressions are not computed with extended precision - e.g., because FLT_EVAL_METHOD is 0.]

Here's one possible reason to prefer a + 0.5 * (b-a): if a and b are very large and have the same sign, then the intermediate quantity a + b in the expression 0.5 * (a + b) might overflow, giving either an infinite result or a floating-point exception. In contrast, a + 0.5 * (b - a) will not overflow in that situation.

However, that small advantage should be weighed against the following:

  • a + 0.5 * (b - a) requires three floating-point operations; 0.5 * (a + b) requires only two.
  • in cases where a + b does not overflow, 0.5 * (a + b) always provides a correctly-rounded answer: that is, it gives the best possible approximation to the actual mean, given the representability constraints of the target type. (That's not entirely obvious, but not hard to prove: either a + b is greater in magnitude than twice the smallest normal, in which case the sum is correctly rounded and the multiplication by 0.5 is exact, or a + b is itself computed exactly, and then the multiplication by 0.5 is correctly rounded. Either way, at most one of the two arithmetic operations can introduce an error.) But a + 0.5 * (b - a) will not always give a correctly-rounded mean, and in fact can be many millions of ulps in error. Consider the case where a = -1.0 and b = 1.0 + 2^-23. Then a + 0.5 * (b - a) gives 0.0. The correct mean is 2^-24.
  • the expression a + 0.5 * (b - a) can also overflow, if a and b are very large with opposite signs rather than the same sign. In that situation, 0.5 * (a + b) will not overflow.
  • a + 0.5 * (b - a) is (very slightly) less readable than 0.5 * (a + b); it requires a moment's more thought on the part of the reader to figure out what it's doing.

Given the above, it's hard to support a general recommendation that a + 0.5 * (b - a) should be used in preference to 0.5 * (a + b).

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
0

Your formula is correct in case if you are computing average of many numbers. In this case you can do the following:

μn = 1/nΣxi

but here while adding 101st number you will need to add x101 to μ100, where μ100 could be pretty big compared to x101, and so you will lose some precision. To avoid this problem you can to like this:

μ101 = μ100 + 1/n(x101 - μ100)

This formula is much better if you xi is of the same order of magnitude, as you avoid dealing with arithmetic operations between two big number and xi.

You might want to read article Numerically stable computation of arithmetic means

Let's look into how numbers are represented in IEEE floating point. Consider C++ float:

Interval [1,2] goes with step 2-23, so you can represent numbers 1+n*2-23, where n belongs to {0, ..., 223}.

Interval [2j, 2j+1] goes with like [1,2] but multiplied by 2j.

To, see how precision is lost you can run this program:

#include <iostream>
#include <iomanip>
int main() {
    float d = pow(2,-23);
    std::cout << d << std::endl;
    std::cout << std::setprecision(8) << d + 1 << std::endl;
    std::cout << std::setprecision(8) << d + 2 << std::endl; // the precision has been lost
    system("pause");
}

The output is

1.19209e-07
1.0000001
2

Yola
  • 18,496
  • 11
  • 65
  • 106