Hope anyone can help explain the following behavior and how to fix it
I have a DataFrame like this:
l = [[1, 50, 3, 4.003], [1, 50, 4, 5.002], [2, 10, 3, 5.003], [2, 50, 2, 5.004]]
df = pd.DataFrame(l, columns=["a", "b", "c","d"])
df # Output:
a b c d
0 1 50 3 4.003
1 1 50 4 5.002
2 2 10 3 5.003
3 2 50 2 5.004
Now if I do a simple sum of column "d" row1 + row2 I should expect a result of 9.005
, right?
Well wrong, instead I get 9.004999999999999
, this is what I do:
sum = df.loc[0]["d"] + df.loc[1]["d"]
sum # Output:
9.004999999999999
This affects the outcome of rounding to 2 decimal points after the sum.
I did some research on the internet and it might have to do with number of bytes used on the float64 data type but is there a way to overcome this issue?
By the way when I try the same with regular python variables it does the same thing:
d0 = 4.003
d1 = 5.002
d0 + d1 # Output:
9.004999999999999
The challenge is that if I need to round the results to two decimals I want to get 9.01
and not 9.0
round((d0 + d1), 2) # Output:
9.0 # It should have been 9.01
So when we have thousands or even millions of records one would have to identify which values are having the point-precision error before summing and rounding to avoid this issue?, it will be very impractical, any other suggestions?