Pandas calculation giving incorrect decimal points

Question

I have below pandas dataframe df:

|      clm1  |      clm2|
|     79.02  |     80.98|
|     78.55  |     81.47|
|     98.99  |    101.01|
|    999.54  |    999.55|
|    999.55  |    999.55|

I am performing below calculation on it:

df['avg'] = (df['clm1']+df['clm2'])/2

print(df)

| clm1   |    clm2   |  avg   |
|79.02   |    80.98  | 80.000 |
|78.55   |    81.47  | 80.010 |
|98.99   |   101.01  |100.000 |
|99.54   |   999.55  |999.545 |
|99.55   |   999.55  |999.550 |

When I am writing the above dataframe to csv I am getting incorrect result.

df.to_csv(myfile.csv)

clm1  , clm2  , avg
79.02 , 80.98 , 80.0
78.55 , 81.47 , 80.00999999999999  *# This should be 80.01*
98.99 , 101.01, 100.0
999.54, 999.55, 999.545
999.55, 999.55, 999.55

I understand the issues with floating point and i have gone through below Answers:

Python float - str - float weirdness Is floating point math broken? These suggest to use Decimal instead of float. But I am not able to find how to do that. Note: I do not want to use any rounding of method. I need the exact result.

related: https://stackoverflow.com/questions/588004/is-floating-point-math-broken — hiro protagonist, Sep 08 '20 at 11:20
user812142 You do want rounded output. Recall 78.55, 81.47, 80.010 are not exactly representable, best to see it as a rounded value to the precision desired. Else 80.00999999999999 is closer to the _exact_ value than 80.010 — chux - Reinstate Monica, Sep 08 '20 at 12:08

B. Bogart · Answer 1 · 2020-09-09T13:36:52.660

Here is an option that casts to int to avoid rounding. This works when taking the average of two columns.

#recreate data
import pandas as pd

df = pd.DataFrame([[79.02,80.98],
                   [78.55,81.47],
                   [98.99,101.01],
                   [999.54,999.55],
                   [999.55,999.55]], columns = ['clm1','clm2'])

#cast all values to integers
df = df.astype(int)
df['avg'] = ((df['clm1']+df['clm2'])/2).astype(int)

#return to floating point
df = (df/1000)
df.to_csv('pandasfile.csv')

The output of the above is:

,clm1,clm2,avg
0,79.02,80.98,80.0
1,78.55,81.47,80.01
2,98.99,101.01,100.0
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Another option: Using the Decimal class with pandas is another option, but tedious and slow if you have to cast a large number of floats as Decimals from your dataframe. Assuming you import the everyting as a Decimal the process would be as follows.

from decimal import Decimal

df = pd.DataFrame([[Decimal("79.02"),Decimal("80.98")],
                   [Decimal("78.55"),Decimal("81.47")],
                   [Decimal("98.99"),Decimal("101.01")],
                   [Decimal("999.54"),Decimal("999.55")],
                   [Decimal("999.55"),Decimal("999.55")]], columns = ['clm1','clm2'])

df['avg'] = (df['clm1']+df['clm2'])/2
df.to_csv('pandasfile.csv')

This gives the following in the csv file:

,clm1,clm2,avg
0,79.02,80.98,80.00
1,78.55,81.47,80.01
2,98.99,101.01,100.00
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Original answer: You can use float_format argument on the to_csv method.

df['avg'] = (df['clm1']+df['clm2'])/2

Specify the number of decimals with float_format:

df.to_csv('pandasfile.csv', float_format='%.3f')

This writes the following to the csv file

,clm1,clm2,avg
0,79.020,80.980,80.000
1,78.550,81.470,80.010
2,98.990,101.010,100.000
3,999.540,999.550,999.545
4,999.550,999.550,999.550

I think the easiest way is to cast as int then back to float64 before writing to the file. You can also use the Decimal class but converting an existing pandas dataframe to Decimal is slow and cumbersome. -- I've updated the answer above. — B. Bogart, Sep 09 '20 at 13:38
I am reading these numbers from a huge already existing file. I tried 1st method and i am getting below result. clm1,clm2,avg 0.079,0.08,0.079 0.078,0.081,0.079 0.098,0.101,0.099 0.999,0.999,0.999 0.999,0.999,0.999 — user812142, Sep 18 '20 at 17:46
When i am converting columns to string and then to Decimal type is it working fine, — user812142, Sep 18 '20 at 17:51

score 1 · Accepted Answer · answered Sep 18 '20 at 17:50

I found the solution.

First convert the columns to string and then to Decimal. This is working perfectly fine and I am getting the correct result without rounding off.

def getAvg(x,y):    
    return ((x.apply(Decimal)+y.apply(Decimal))/Decimal(2)).apply(Decimal)

df['avg'] = getAvg(df['clm1'].astype('str'),df['clm2'].astype('str'))

score 0 · Answer 3 · answered Sep 08 '20 at 14:48

Here is a small example of using the Decimal class (though not with pandas):

from decimal import Decimal

xs = [Decimal("79.02"), Decimal("78.55"), Decimal("98.99"),
     Decimal("999.54"), Decimal("999.55")]

ys = [Decimal("80.98"), Decimal("81.47"), Decimal("101.01"), 
      Decimal("999.55"), Decimal("999.55")]

# conversion with str() is to align columns
for x, y in zip(xs, ys):
    print(f'{str(x):>8s} {str(y):>8s} {str((x + y) / 2):>8s}')

   79.02    80.98    80.00
   78.55    81.47    80.01
   98.99   101.01   100.00
  999.54   999.55  999.545
  999.55   999.55   999.55

The Python built-in decimal package has several rounding options; docs here
'What Every Computer Scientist Should Know About Floating-Point Arithmetic' has an accessible overview of the IEEE floating-point standard here

Pandas calculation giving incorrect decimal points

3 Answers3