1

This scenario is a simplification of an ETL scenario involving multiple sets of data pulled from MySQL tables. I have a merged dataframe where one price column is type float64 and the other is type object.

import pandas as pd

df = pd.DataFrame({
    'price1': [0.066055],
    'price2': ['0.066055'],
})

>>> df.dtypes
price1    float64
price2     object
dtype: object

When these two columns are converted to float64, the column price1 is rounded incorrectly when rounded to 5 digits.

float64_df = df[price_cols].apply(lambda x: pd.to_numeric(x))

>>> float64_df.dtypes
price1    float64
price2    float64
dtype: object

>>> float64_df[price_cols].apply(lambda x: x.round(5))
    price1   price2
0  0.06606  0.06605

However, when the columns are converted to float32 using downcast='float', the rounding works as expected.

float32_df = df[price_cols].apply(lambda x: pd.to_numeric(x, downcast='float'))

>>> float32_df.dtypes
price1    float32
price2    float32
dtype: object

>>> float32_df[price_cols].apply(lambda x: x.round(5))
    price1   price2
0  0.06606  0.06606

Any ideas why the rounding doesn't work properly when both columns are of type float64?

Kurt Kline
  • 1,724
  • 1
  • 10
  • 23

3 Answers3

2

Printing the floats with higher precision shows that pd.to_numeric converted '.066055' to 0.06605499999999998872.

with pd.option_context('display.float_format', '{:0.20f}'.format):
    print(float64_df)

Output:

                  price1                 price2
0 0.06605500000000000260 0.06605499999999998872
dannyadam
  • 3,950
  • 2
  • 22
  • 19
1

The short answer is pd.to_numeric outputs different values for the two:

pd.to_numeric(0.066055)
pd.to_numeric('0.066055')

# 0.066055
# 0.06605499999999999

In the case of 0.066055, it simply returns the value.

In the case of '0.066055', I believe it uses this function for converting the string to a float.

This answer may also be helpful.

Anne
  • 253
  • 3
  • 10
0

Getting exact numbers with floats is somewhat impossible and floats are always somewhat unpredictable. My guess is that the object results in a float64 a little bit smaller than the original number e.g. 0.066054999999999999 or something similar, resulting in the unexpected rounding result.

Python has some documentation about this.

w7sbc
  • 63
  • 7