How to get precise summation of an array with floating point elements?

Question

I have a one dimensional numpy.array:

a = array([352305, 506833,  35384,  32278,  22453])

By dividing all elements in this array by an integer t = 949253, I get another array b:

b = a/ti

Then after setting b[0] = 0, I let b[0] = -sum(b).

Now the new array b should sum up to zero by doing sum(b); however, I can only get a very little number such as -2.0816681711721685e-17. Is there any direct or indirect way to get exactly zero (not doing something like sum(b).round(5))?

I've tried something I learned from another answers using Decimal module. This is what I did:

for bi in b:
    bi = Decimal(bi).quantize(Decimal('.01'), rounding=ROUND_UP)

But I still get nonzero sum(b).

Additionally, when I did b[0]+sum(b[1:len(b)]), I got 0.0. Why is it different from the result of sum(b)?

Finally, if more precision couldn't work, is there any way to throw away some precision such as truncating the floating number and adjust them to sum up to exactly zero?

You may choose a threshold, generally called as `epsilon`, close to `0.00001` and compare if the `abs(output - 0) < epsilon`. In floating point operations `epsilon` serves great purposes and it seems to be standard way of handling such scenarios — ZdaR, Oct 21 '17 at 06:05
https://stackoverflow.com/questions/588004/is-floating-point-math-broken. Each floating point number is represented with only finitely many bits of precision. Therefore when dealing with floating point, you generally have to accept some loss of accuracy. Not all numerical algorithms are created equally -- some are more stable than others. The trick is to understand your data and limits of stability. — MFisherKDX, Oct 21 '17 at 06:08
probably numpy rounding values. So you data while storing it back. — not_python, Oct 21 '17 at 06:30
Funny fact: sum(b[::-1]) is exactly zero. Precision gets lost mainly during calculation -0.628... + 0.533... as a few bits just cancel out, the mantissa bits then are shifted, and the exponent is decreased. The new bits that appear on the right in the mantissa contribute nothing to precision, of course. — lomereiter, Oct 21 '17 at 08:52
To @ZdaR , I know something like `numpy.allclose()` can do what you said but I think that is the same as doing `sum(b).round(5)`. To @MFisherKDX , I totally agree that, so it would be fine even I need to throw away the precision. However, it would require me to adjust some of the values to get the elements cancel out nicely. — Sun-Ting Tsai, Oct 21 '17 at 15:44

Jake · Accepted Answer · 2017-10-21T09:01:20.673

0

Decimal can has more precision than regular python floats but it is still do rounding when it is dealing with fractions like 1/3.

However there is Fraction class which is exaclty what you need. It is storing numbers as nominator and denominator and therefore lacking of precision and rounding issues.

Let's say we have this code:

t = 949253
a = [352305, 506833,  35384,  32278,  22453]
b = [x/t for x in a]

b[0] = 0
b[0] = -sum(b)

print(sum(b)) # prints -2.0816681711721685e-17

Modifying it to use Fraction would be as simple as that:

from fractions import Fraction

t = 949253
a = [352305, 506833,  35384,  32278,  22453]
b = [Fraction(x, t) for x in a]

b[0] = 0
b[0] = -sum(b)

print(sum(b)) # prints 0

edited Oct 21 '17 at 09:01

answered Oct 21 '17 at 08:19

Jake

1,339
9
14

It is good to get sum(b) = 0 under the scope of Fraction instance. However, the problem would come back (or even more problems would appear) when manipulating this array. For example, assuming a 2-dim array `d=np.array([[Fraction(-8,35), Fraction(3,7), Fraction(-1,5)],[Fraction(1,3), Fraction(-2,3), Fraction(1,3)],[Fraction(1,3), Fraction(2,3), Fraction(-1,1)]])`. Then `sum(d,1)=array([Fraction(0, 1), Fraction(0, 1), Fraction(0, 1)], dtype=object)`, which is fine because I sum up the row elements and the results should be zero. – Sun-Ting Tsai Oct 21 '17 at 16:13
However, when I convert the elements to float I get `array([[-0.22857142857142856, 0.42857142857142855, -0.2], [0.3333333333333333, -0.6666666666666666, 0.3333333333333333], [0.3333333333333333, 0.6666666666666666, -1.0]], dtype=object)`, and sum(d,1) still get a very little nonzero number for the first row. Also, this array then have problem with doing things like diagonalization (TypeError). – Sun-Ting Tsai Oct 21 '17 at 16:17

How to get precise summation of an array with floating point elements?

1 Answers1