Float64 fields having floating point errors on Pandas [Python]

Question

I am aware of Python having floating point errors when using the normal types. That is why I am using Pandas instead.

I suddenly started having some issues with data I input (not calculation) and cannot explain the following behavior:

In [600]: df = pd.DataFrame([[0.05], [0.05], [0.05], [0.05]], columns = ['a'])

In [601]: df.dtypes
Out[601]:
a    float64
dtype: object

In [602]: df['a'].sum()
Out[602]: 0.20000000000000001

In [603]: df['a'].round(2).sum()
Out[603]: 0.20000000000000001

In [604]: (df['a'] * 1000000).round(0).sum()
Out[604]: 200000.0

In [605]: (df['a'] * 1000000).round(0).sum() / 1000000
Out[605]: 0.20000000000000001

Hopefully somebody can help me either fix this or figure out how to correctly sum 0.2 (or I don't mind if the result is 20 or 2000, but as you can see when I then divide I get to the same point where the sum is incorrect!).

(to run my code remember to do import pandas as pd)

That's the nature of floating point numbers. It's just a representation, so you can ignore it. Why is it a problem for you? BTW try this: `print(0.1 + 0.2)` — MaxU - stand with Ukraine, Jan 24 '17 at 14:57
@MaxU I get that Max. that is why I would try to solve if by doing 0.05 * 100 = 5, sum those 4 and get 20. But my problem is the system sees 20.00000..001 and even rounding doesn't seem to do the trick. — Yona, Jan 24 '17 at 15:45
Possible duplicate of [Is floating point math broken?](http://stackoverflow.com/questions/588004/is-floating-point-math-broken) — Martin Valgur, Jan 24 '17 at 15:56

score 0 · Answer 1 · answered Jan 24 '17 at 15:53

Ok so this works:

In [642]: (((df * 1000000).round(0)) / 1000000).sum()
Out[642]:
a    0.2
dtype: float64

But this doesn't:

In [643]: (((df * 1000000).round(0))).sum() * 1000000
Out[643]:
a    2.000000e+11
dtype: float64

So you have to do all calculations inside the Panda array or risk breaking up things.

score 0 · Answer 2 · answered Jan 24 '17 at 16:08

"I get to the same point where the sum is incorrect!" By your definition of incorrect nearly all floating point operations would be incorrect. Only powers of 2 are perfectly represented by floating points, everything else has a rounding error of about 15–17 decimal digits (for double precision floats). Some applications just hide this error better than others when displaying these values. That precision is far more than sufficient for the data you are using.

If you are bothered by the ugly-looking output, then you can do "{:.1f}".format(value) to round the output string to 1 decimal digit after the point or "{:g}".format(value) to automatically select a reasonable number of digits for display.

"Only powers of 2" is understating it a bit: there are approximately 18 billion billion real numbers that can be exactly represented in the usual IEEE 754 binary64 floating-point format. Only 2098 out of those 18 billion billion are powers of 2. — Mark Dickinson, Jan 24 '17 at 16:53

Float64 fields having floating point errors on Pandas [Python]

2 Answers2