0

I have below pandas dataframe df:

|      clm1  |      clm2|
|     79.02  |     80.98|
|     78.55  |     81.47|
|     98.99  |    101.01|
|    999.54  |    999.55|
|    999.55  |    999.55|

I am performing below calculation on it:

df['avg'] = (df['clm1']+df['clm2'])/2

print(df)

| clm1   |    clm2   |  avg   |
|79.02   |    80.98  | 80.000 |
|78.55   |    81.47  | 80.010 |
|98.99   |   101.01  |100.000 |
|99.54   |   999.55  |999.545 |
|99.55   |   999.55  |999.550 |

When I am writing the above dataframe to csv I am getting incorrect result.

df.to_csv(myfile.csv)

clm1  , clm2  , avg
79.02 , 80.98 , 80.0
78.55 , 81.47 , 80.00999999999999  *# This should be 80.01*
98.99 , 101.01, 100.0
999.54, 999.55, 999.545
999.55, 999.55, 999.55

I understand the issues with floating point and i have gone through below Answers:

Python float - str - float weirdness Is floating point math broken? These suggest to use Decimal instead of float. But I am not able to find how to do that. Note: I do not want to use any rounding of method. I need the exact result.

user812142
  • 163
  • 2
  • 17

3 Answers3

2

Here is an option that casts to int to avoid rounding. This works when taking the average of two columns.

#recreate data
import pandas as pd

df = pd.DataFrame([[79.02,80.98],
                   [78.55,81.47],
                   [98.99,101.01],
                   [999.54,999.55],
                   [999.55,999.55]], columns = ['clm1','clm2'])

#cast all values to integers
df = df.astype(int)
df['avg'] = ((df['clm1']+df['clm2'])/2).astype(int)

#return to floating point
df = (df/1000)
df.to_csv('pandasfile.csv')

The output of the above is:

,clm1,clm2,avg
0,79.02,80.98,80.0
1,78.55,81.47,80.01
2,98.99,101.01,100.0
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Another option: Using the Decimal class with pandas is another option, but tedious and slow if you have to cast a large number of floats as Decimals from your dataframe. Assuming you import the everyting as a Decimal the process would be as follows.

from decimal import Decimal

df = pd.DataFrame([[Decimal("79.02"),Decimal("80.98")],
                   [Decimal("78.55"),Decimal("81.47")],
                   [Decimal("98.99"),Decimal("101.01")],
                   [Decimal("999.54"),Decimal("999.55")],
                   [Decimal("999.55"),Decimal("999.55")]], columns = ['clm1','clm2'])

df['avg'] = (df['clm1']+df['clm2'])/2
df.to_csv('pandasfile.csv')

This gives the following in the csv file:

,clm1,clm2,avg
0,79.02,80.98,80.00
1,78.55,81.47,80.01
2,98.99,101.01,100.00
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Original answer: You can use float_format argument on the to_csv method.

df['avg'] = (df['clm1']+df['clm2'])/2

Specify the number of decimals with float_format:

df.to_csv('pandasfile.csv', float_format='%.3f')

This writes the following to the csv file

,clm1,clm2,avg
0,79.020,80.980,80.000
1,78.550,81.470,80.010
2,98.990,101.010,100.000
3,999.540,999.550,999.545
4,999.550,999.550,999.550
B. Bogart
  • 998
  • 6
  • 15
  • But I do not want to perform any round of function. – user812142 Sep 08 '20 at 11:56
  • I think the easiest way is to cast as int then back to float64 before writing to the file. You can also use the Decimal class but converting an existing pandas dataframe to Decimal is slow and cumbersome. -- I've updated the answer above. – B. Bogart Sep 09 '20 at 13:38
  • I am reading these numbers from a huge already existing file. I tried 1st method and i am getting below result. clm1,clm2,avg 0.079,0.08,0.079 0.078,0.081,0.079 0.098,0.101,0.099 0.999,0.999,0.999 0.999,0.999,0.999 – user812142 Sep 18 '20 at 17:46
  • When i am converting columns to string and then to Decimal type is it working fine, – user812142 Sep 18 '20 at 17:51
1

I found the solution.

First convert the columns to string and then to Decimal. This is working perfectly fine and I am getting the correct result without rounding off.

def getAvg(x,y):    
    return ((x.apply(Decimal)+y.apply(Decimal))/Decimal(2)).apply(Decimal)

df['avg'] = getAvg(df['clm1'].astype('str'),df['clm2'].astype('str'))
user812142
  • 163
  • 2
  • 17
0

Here is a small example of using the Decimal class (though not with pandas):

from decimal import Decimal

xs = [Decimal("79.02"), Decimal("78.55"), Decimal("98.99"),
     Decimal("999.54"), Decimal("999.55")]

ys = [Decimal("80.98"), Decimal("81.47"), Decimal("101.01"), 
      Decimal("999.55"), Decimal("999.55")]

# conversion with str() is to align columns
for x, y in zip(xs, ys):
    print(f'{str(x):>8s} {str(y):>8s} {str((x + y) / 2):>8s}')

   79.02    80.98    80.00
   78.55    81.47    80.01
   98.99   101.01   100.00
  999.54   999.55  999.545
  999.55   999.55   999.55
  • The Python built-in decimal package has several rounding options; docs here
  • 'What Every Computer Scientist Should Know About Floating-Point Arithmetic' has an accessible overview of the IEEE floating-point standard here
jsmart
  • 2,921
  • 1
  • 6
  • 13