5

When I import an Excel file, some numbers in a column are float and some are not. How can I convert all to float? The space in 3 000,00 is causing me problems.

  df['column']:
             column
0          3 000,00
1            156.00
2                 0

I am trying:

df['column'] = df['column'].str.replace(' ','')

but it's not working. I would do after .astype(float), but cannot get there. Any solutions? 1 is already a float, but 0 is a string.

Alexander L. Hayes
  • 3,892
  • 4
  • 13
  • 34
OhMikeGod
  • 107
  • 1
  • 5
  • I think this should help: https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas – Newskooler Dec 18 '18 at 20:49
  • 'but it's not working', what isn't working? – mckuok Dec 18 '18 at 20:51
  • I already tried the to_numeric() but: Unable to parse string "3 000,00" at position 0 and I cant erase the space with replace above. Column values dont change and stay 3 000,00 – OhMikeGod Dec 18 '18 at 20:55

2 Answers2

5

Just cast them all as a string first:

df['column'] = [float(str(val).replace(' ','').replace(',','.')) for val in df['column'].values]

Example:

>>> df = pd.DataFrame({'column':['3 000,00', 156.00, 0]})
>>> df['column2'] = [float(str(val).replace(' ','').replace(',','.')) for val in df['column'].values]
>>> df
     column  column2
0  3 000,00   3000.0
1       156    156.0
2         0      0.0
Tim
  • 2,756
  • 1
  • 15
  • 31
  • your exemple works, but when I try with my data, i get :could not convert string to float: '3\xa0000'. Maybe my import is wrong: df = pd.read_excel('test.xlsx') – OhMikeGod Dec 18 '18 at 21:08
  • yeah, that sounds like you have a weird character or unexpected line break – Yuca Dec 18 '18 at 21:16
  • 1
    Looks like either Excel or your source has worked weird characters into your data. Remove those characters as well before converting to float: `.replace(u'\xa0', '')` – Tim Dec 18 '18 at 21:17
  • The only weird thing I see is that I also have values with -9 000,00 but it works in the example above. Im not sure what is the problem... – OhMikeGod Dec 18 '18 at 21:20
1
import re    
df['column'] = df['column'].apply(lambda x: re.sub("[^0-9.]", "", str(x).replace(',','.'))).astype(float)
Pavel Fedotov
  • 748
  • 1
  • 7
  • 29