Numpy float mean calculation precision

Question

I happen to have a numpy array of floats:

a.dtype, a.shape
#(dtype('float64'), (32769,))

The values are:

a[0]
#3.699822718929953
all(a == a[0])
True

However:

a.mean()
3.6998227189299517

The mean is off by 15th and 16th figure.

Can anybody show how this difference is accumulated over 30K mean and if there is a way to avoid it?

In case it matters my OS is 64 bit.

I don't get it. Do you expect `a[0] == a.mean()`? Are **all** `a[i] = 3.69...`? Then, why testing `any(a == a[0])` and not `all(a == a[0])`? Even then, `mean` is still some calculation including summation and division, so floating point differences are kind of expected!? — HansHirse, Nov 24 '20 at 11:14
@HansHirse as far as I understand all array items have to same value, so the mean should be the same as each item. It isn't due to some rounding error or whatever, which is basically the OP: why does that happen and how to avoid it. — Roland Deschain, Nov 24 '20 at 11:16
I did some tests and it seems with `math.fsum(a)` you get the correct summation instead of `np.sum` or just `sum()`. With that you could simply implement your own mean function. source: https://stackoverflow.com/questions/33004029/is-numpy-sum-implemented-in-such-a-way-that-numerical-errors-are-avoided — Roland Deschain, Nov 24 '20 at 11:18
@RolandDeschain Thank you! I am sure there are some other ways to get the "exact" results. I am interested in how this change gets accumulated over 30K array and are there insignificant figures with floating point numpy representation. How many if so? — Sergey Bushmanov, Nov 24 '20 at 11:21
@RolandDeschain: It is due to rounding error; the fact that all array elements are the same does not mean the partial sums produced as they are added sequentially do not have rounding errors. — Eric Postpischil, Nov 24 '20 at 12:03
@SergeyBushmanov: What specifically does your question “How it gets accumulated” mean? Consider a floating-point format in which the significand has three bits, and all the numbers in the array are 5, which is 1.01•2^2 in floating-point. Adding two of them gives 1.01•2^3. But then adding a third would give 1.111•2^3, which cannot be represented with a three-bit significand, so it is rounded to 10.000•2^3 = 1.00•2^4. Such roundings occur repeatedly as the partial sums progress. So of course the final sum is often different from the real-number sum. What more do you want to know about that? — Eric Postpischil, Nov 24 '20 at 12:06
It is not likely your goal is actually to calculate the mean of an array of identical values, as the mean is the sole value. Are you trying to calculate the mean of some other arrays? Or do you just want to understand how floating-point arithmetic works generally? — Eric Postpischil, Nov 24 '20 at 12:10
@EricPostpischil "How it gets accumulated" means I would like to see a progressive example of how the rounding error gets accumulated, with explanation of source of error and how many figures are significant. All the answers so far state the obvious facts about float representation. Yours is closest. Am I getting it right that rounding errors accumulate and do not cancel each other? Is there a way to estimate number of significant figures? — Sergey Bushmanov, Nov 24 '20 at 12:56
@SergeyBushmanov: To some extent, rounding errors may be modeled as random. They may accumulate or cancel, so the accumulation of partial sums is a random walk. The maximum error resulting from n operations each with a maximum error of e is of course n•e, but the standard deviation is around sqrt(n)•e. However, accumulating a sum of n elements of the same or similar magnitude does not have a constant maximum error of e in each operation because the sum grows, and the maximum error grows along with it; it increase in exponent is an increasing in the point where rounding has to occur… — Eric Postpischil, Nov 24 '20 at 13:06
… So creating a precise model of the rounding errors in such a partial sum is difficult, but bounds and estimates could be done. — Eric Postpischil, Nov 24 '20 at 13:07
@EricPostpischil I would be grateful if you put your thoughts in an answer, with some examples both in binary and floats, showing the reasons for accumulation or cancelling errors.Certainly, I will accept it. — Sergey Bushmanov, Nov 24 '20 at 13:12
@SergeyBushmanov: It would help if you describe the real problem you are trying to solve. What one can say about the sum of an array of elements of the same sign and approximately the same magnitude is different from what one can say about the sum of an array of elements of the same sign and varying magnitudes and about the sum of an array of elements of varying signs and magnitudes. — Eric Postpischil, Nov 24 '20 at 13:19
@EricPostpischil The problem at hand is very clear (if it's not feel free to edit the title): "What one can say about the sum of an array of elements of the same sign and approximately the same magnitude". I would perhaps add how to estimate number of significant digits for a sum of N figures (100, 1000, 1000000). If you have time and desire to generalize, it would be great — Sergey Bushmanov, Nov 24 '20 at 13:24
Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — EOF, Nov 24 '20 at 21:43

Mazdak · Answer 1 · 2020-11-24T12:16:38.873

This is due to incapability of float64 type to store the sum of your float numbers with correct precision. In order to get around this problem you need to use a larger data type of course^*. Numpy has a longdouble dtype that you can use in such cases:

In [23]: np.mean(a, dtype=np.longdouble)                                                                                                                                                                    
Out[23]: 3.6998227189299530693

Also, note:

In [25]: print(np.longdouble.__doc__)                                                                                                                                                                       
Extended-precision floating-point number type, compatible with C
    ``long double`` but not necessarily with IEEE 754 quadruple-precision.
    Character code: ``'g'``.
    Canonical name: ``np.longdouble``.
    Alias: ``np.longfloat``.
    Alias *on this platform*: ``np.float128``: 128-bit extended-precision floating-point number type.

_{* read the comments for more details.}

Re ”In order to get around this problem you need to use a larger data type of course”: There is no “of course” about it; this is not true, because there are several alternatives available. One can use `math.fsum` to calculate the sum with more accuracy. One can use Kahan summation to calculate the sum. One can first calculate the mean approximately, then subtract the approximation from all array elements and then use the mean of the results to adjust the approximation. — Eric Postpischil, Nov 24 '20 at 12:01
@EricPostpischil One other way is to look at all the items regardless of their types and then return one of them if they're all equal. So programmatically yes, there are other ways and since we're dealing with finite-precision floating-points maybe we don't need to worry about the mathematical or philosophical aspects of it. — Mazdak, Nov 24 '20 at 12:15

Daniel F · Answer 2 · 2020-11-24T12:22:46.440

The mean is (by definition):

a.sum()/a.size

Unfortunately, adding all those values up and dividing accumulates floating point errors. They are usually around the magnitude of:

np.finfo(np.float).eps
Out[]: 2.220446049250313e-16

Yeah, e-16, about where you get them. You can make the error smaller by using higher-accuracy floats like float128 (if your system supports it) but they'll always accumulate whenever you're summing a large number of float together. If you truly want the identity, you'll have to hardcode it:

def mean_(arr):
    if np.all(arr == arr[0]):
        return arr[0]
    else:
        return arr.mean()

In practice, you never really want to use == between floats. Generally in numpy we use np.isclose or np.allclose to compare floats for exactly this reason. There are ways around it using other packages and leveraging arcane machine-level methods of calculating numbers to get (closer to) exact equality, but it's rarely worth the performance and clarity hit.

Eric Postpischil · Accepted Answer · 2020-11-25T00:47:15.907

Here is a rough approximation of a bound on the maximum error. This will not be representative of average error, and it could be improved with more analysis.

Consider calculating a sum using floating-point arithmetic with round-to-nearest ties-to-even:

sum = 0;
for (i = 0; i < n; ++n)
    sum += a[i];

where each a[i] is in [0, m).

Let ULP(x) denote the unit of least precision in the floating-point number x. (For example, in the IEEE-754 binary64 format with 53-bit significands, if the largest power of 2 not greater than |x| is 2^p, then ULP(x) = 2^p−52. With round-to-nearest, the maximum error in any operation with result x is ½ULP(x).

If we neglect rounding errors, the maximum value of sum after i iterations is i•m. Therefore, a bound on the error in the addition in iteration i is ½ULP(i•m). (Actually zero for i=1, since that case adds to zero, which has no error, but we neglect that for this approximation.) Then the total of the bounds on all the additions is the sum of ½ULP(i•m) for i from 1 to n. This is approximately ½•n•(n+1)/2•ULP(m) = ¼•n•(n+1)•ULP(m). (This is an approximation because it moves i outside the ULP function, but ULP is a discontinuous function. It is “approximately linear,“ but there are jumps. Since the jumps are by factors of two, the approximation can be off by at most a factor of two.)

So, with 32,769 elements, we can say the total rounding error will be at most about ¼•32,769•32,770•ULP(m), about 2.7•10⁸ times the ULP of the maximum element value. The ULP is 2⁻⁵² times the greatest power of two not less than m, so that is about 2.7•10⁸•2⁻⁵² = 6•10⁻⁸ times m.

Of course, the likelihood that 32,768 sums (not 32,769 because the first necessarily has no error) all round in the same direction by chance is vanishingly small but I conjecture one might engineer a sequence of values that gets close to that.

An Experiment

Here is a chart of (in blue) the mean error over 10,000 samples of summing arrays with sizes 100 to 32,800 by 100s and elements drawn randomly from a uniform distribution over [0, 1). The error was calculated by comparing the sum calculated with float (IEEE-754 binary32) to that calculated with double (IEEE-754 binary64). (The samples were all multiples of 2⁻²⁴, and double has enough precision so that the sum for up to 2²⁹ such values is exact.)

The green line is c n √n with c set to match the last point of the blue line. We see it tracks the blue line over the long term. At points where the average sum crosses a power of two, the mean error increases faster for a time. At these points, the sum has entered a new binade, and further additions have higher average errors due to the increased ULP. Over the course of the binade, this fixed ULP decreases relative to n, bringing the blue line back to the green line.

Thanks! You provided an upper bound. Is there a chance for a logic to reason about *expected* rounding error? — Sergey Bushmanov, Nov 24 '20 at 13:53
@SergeyBushmanov: [The expected distance of a random walk of n uniform steps is about sqrt(2n/π).](https://math.stackexchange.com/questions/103142/expected-value-of-random-walk) Multiplying that by ½ULP(n•m) gives the expectation if all errors were drawn from the same distribution. But the earlier ones are smaller. So the expected distance is smaller. But I do not know if it would reduce by half; the later steps may dominate the expected value, not accumulate/average with them as in the calculation of the upper bound. So ½sqrt(2n/π)ULP(n•m) is an approximate upper bound on the expected value. — Eric Postpischil, Nov 24 '20 at 14:06
I would expect an expected value of a process that can go both directions to be lower than upper bound (going always up). But you say its a multiple of `sqrt(2n/π)` which is more than 1 for bigger n and monotonically increasing. Is my intuition wrong? — Sergey Bushmanov, Nov 24 '20 at 14:18
@SergeyBushmanov: sqrt(2n/π) is the expected distance in units of steps. When the steps are ½ULP(n•m), the expected distance is ½sqrt(2n/π)ULP(n•m), about ½sqrt(2n/π)nULP(m), which is less than the maximum distance, ¼n(n+1)ULP(n). Although I did not account for the fact that the steps are also drawn from a distribution; they are a drawn from a uniform distribution over some interval [−½u, +½u], but that average distance for a random walk is for when they are drawn from just the two points −½u and +½u. So the correct average will be even less. As you can see, this is complicated. — Eric Postpischil, Nov 24 '20 at 15:13

Numpy float mean calculation precision

3 Answers3

An Experiment