0

I am trying to change a column (price) which is an object data type to use in a groupby. Changing this column to float sometimes keeps the same number of decimal places as the original, sometimes rounds to one decimal place, and sometimes gets rid of all decimal places altogether. I would like to keep the float values the same as the original due to eventual reconciliation that needs to be accurate to the decimal place. I have tried changing the column type using astye, and also pd.to_numeric. Ideally, price_3 and price_4 should apples should be 93927.82. Any help would be greatly appreciated.

import pandas as pd

d = {'product': ['apples', 'pears', 'grapes', 'oranges'], 
     'price': ['$93,927.83' , '$9,868.23', '$110,838.10', '$10,093.88']}

df = pd.DataFrame(data=d)


df['price_2'] = df['price'].str.replace('$', '').str.replace(',', '').str.replace('(', '').str.replace(')', '')
df['price_3'] = df['price_2'].astype(float)
df['price_4'] = pd.to_numeric(df['price_2'])

enter image description here

Taos
  • 23
  • 4
  • https://stackoverflow.com/questions/588004/is-floating-point-math-broken/588014 – slothrop Feb 16 '23 at 17:01
  • Your requirement of "eventual reconciliation that needs to be accurate to the decimal place" is just the kind of situation where floats can be problematic. Two potential solutions are (1) use ints, and store the amount as cents (so $93,927.83 becomes integer 9392783) or (2) use Python's inbuilt `Decimal` class. – slothrop Feb 16 '23 at 17:04
  • 2
    *price_3 and price_4 should apples should be 93927.82.*...How do you get `93927.82` from `'$93,927.83'`? Also, no repro: https://i.stack.imgur.com/N2Tf6.png – BigBen Feb 16 '23 at 17:04
  • 2
    Try: `pandas.set_option('display.precision', 2)` and see if that helps with the presentation – JonSG Feb 16 '23 at 17:07
  • 1
    @JonSG Thank so much for the comment. Yes, I saw the results are wanted when I printed out that column! – Taos Feb 16 '23 at 19:22
  • @slothrop Thanks for the comment. Both good suggestions that I would keep in mind! – Taos Feb 16 '23 at 19:23

2 Answers2

1

You likely want to set the display precision:

import pandas
pandas.set_option('display.precision', 2)

df = pandas.DataFrame({
    'product': ['apples', 'pears', 'grapes', 'oranges'], 
    'price': ['$93,927.83' , '$9,868.23', '$110,838.10', '$10,093.88']
})
df['price_2'] = df['price'].str.replace('$', '').str.replace(',', '').str.replace('(', '').str.replace(')', '')
df['price_3'] = df['price_2'].astype(float)
df['price_4'] = pandas.to_numeric(df['price_2'])

print(df['price_4'])

Giving you:

0     93927.83
1      9868.23
2    110838.10
3     10093.88
JonSG
  • 10,542
  • 2
  • 25
  • 36
1

Try changing df['price_3'] = df['price_2'].astype(float) to df['price_3'] = df['price_2'].apply(Decimal).

Timus
  • 10,974
  • 5
  • 14
  • 28