0

Please, I need to change the column "Sale" to a numeric format.

import pandas as pd
data = {'Sale':['9.455', '34,65%', '234.12', '32.6%','3,5%']}
df = pd.DataFrame(data) 
df.dtypes

I need to get the following result

[9455, 0.3465, 23412, 0.326,0.035]

Thank you very much Hugo,

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
Hugo Coras
  • 35
  • 6

2 Answers2

2

You could use pandas.to_numeric + numpy.where:

values = pd.to_numeric(df.Sale.str.replace("[.%]", "").str.replace(",", "."))
result = np.where(df.Sale.str.contains("%"), values / 100, values)
print(pd.Series(result))

Output

0     9455.0000
1        0.3465
2    23412.0000
3        3.2600
4        0.0350
dtype: float64

The line:

values = pd.to_numeric(df.Sale.str.replace("[.%]", "").str.replace(",", "."))

converts the strings to numeric values after some pre-processing, then:

result = np.where(df.Sale.str.contains("%"), values / 100, values)

basically divides the numbers by 100 if it was a percentage value.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

With custom conversion:

import pandas as pd


def cast_floats(n):
    if n.endswith('%'):
        n = float(n.strip('%').replace(',', '.')) / 100
    else:
        dec_places = len(n[n.index('.') + 1:])
        n = float(n) * (10 ** dec_places)
    return n


data = {'Sale': ['9.455', '34,65%', '234.12', '32.6%', '3,5%']}
df = pd.DataFrame(data)
df['Sale'] = df['Sale'].apply(cast_floats)
print(df)

The output:

         Sale
0   9455.0000
1      0.3465
2  23412.0000
3      0.3260
4      0.0350
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105