Convert number strings with commas in pandas DataFrame to float

Question

I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

But when I apply it to the DataFrame, I get an error.

df.apply(locale.atof)

TypeError: ("cannot convert the series to ", u'occurred at index 0')

and

df[0:1].apply(locale.atof)

gives another error:

ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

So, how do I convert this DataFrame of strings to a DataFrame of floats?

Old question, but the OP is getting that error because `apply` on a DataFrame passes a whole **column** to the function as a series (in this case `locale.atof`, which expects a string). If you use the `applymap` method that @AndyHayden does in the answer below, you should be able to do this just fine. — T.C. Proctor, Mar 02 '18 at 00:36

score 226 · Accepted Answer · edited Aug 09 '18 at 16:04

226

If you're reading in from csv then you can use the thousands arg:

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.

You need to set the locale first:

In [ 9]: import locale

In [10]: from locale import atof

In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'

In [12]: df.applymap(atof)
Out[12]:
      0        1
0  1200  4200.00
1  7000    -0.03
2     5     0.00

edited Aug 09 '18 at 16:04

jezrael

822,522
95
1,334
1,252

answered Mar 03 '14 at 02:54

Andy Hayden

359,921
101
625
535

I should have said that I did set the locale. I still get the error. – pheon Mar 03 '14 at 14:33
2

But I am using df.read_fwf, and that has the " thousands=',' " option too, which works. Thanks. – pheon Mar 03 '14 at 14:38
Then again, why does df.applymap(atof) work for you but not me? My locale is 'en_US.UTF-8'. – pheon Mar 03 '14 at 14:38
@pheon that should work too, running the setlocale function should get atof working (in the referenced answer they are in en_US locale and it works)... – Andy Hayden Mar 03 '14 at 17:22
@pheon if it doesn't that's weird. Presumably it doesn't work for `atof('3,000')` by itself / in vanilla python? (You could post a separate question about just that, without the pandas references.) – Andy Hayden Mar 03 '14 at 17:37
15

I voted this up for the 'thousands' argument tip for the read_csv function. That worked great for me. – rockfakie Apr 28 '16 at 23:09
7

I wanted to add that you can also use "decimal=',' " if you're dealing with floats. – VessoVit Mar 23 '17 at 11:36
I find that (for csv files) it worked when doing this `df = pd.read_csv(file='foo.csv', sep=',', thousands=',')`. the difference being with the `sep=` argument as a `,` and not `\t`. – D.L Jan 31 '22 at 15:44
Should be `pd.read_csv` not `df.read_csv`. – Bill Apr 01 '23 at 21:03

score 62 · Answer 2 · edited Jul 30 '19 at 09:56

62

You can convert one column at a time like this :

df['colname'] = df['colname'].str.replace(',', '').astype(float)

edited Jul 30 '19 at 09:56

Arkistarvh Kltzuonstev

6,824
7
26
56

answered Jul 19 '19 at 06:31

ghollah kioko

737
5
3

With this, I get a Warning: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will*not* be treated as literal strings when regex=True.. No idea why it assumes that regex=True – Cristian Avendaño Feb 16 '21 at 19:40
5

That's a *horrible* idea. It will convert `0,2` to `2` instead of `0.2`. There's simply no way one can use replacements to parse localized number literals. What about `10,000.0`? What about `10.000,00` ? – Panagiotis Kanavos Jun 18 '21 at 08:56
2

Thank you, @PanagiotisKanavos. Your comment prevented me from tumbling in this major pitfall and continuing to work with heavily messed up data. `pd.Series('0,5').str.replace(',', '').astype(float)` returns 5! – jlplenio Jul 09 '21 at 12:50

score 43 · Answer 3 · answered Apr 18 '18 at 09:49

43

You may use the pandas.Series.str.replace method:

df.iloc[:,:].str.replace(',', '').astype(float)

This method can remove or replace the comma in the string.

answered Apr 18 '18 at 09:49

shen ke

655
6
7

3

I'm getting "AttributeError: 'DataFrame' object has no attribute 'str'", no idea why... – krassowski Nov 21 '19 at 16:00
3

But this works: `df.apply(lambda x: x.str.replace(',', '').astype(float), axis=1)` – krassowski Nov 21 '19 at 16:03
3

What if my number have more than one comma? like: "1,099,99", how can I convert it to "'1099.99'"? – Abimael Domínguez Jul 09 '21 at 21:18

Robert Van Ysendyck · Answer 4 · 2023-01-10T21:34:59.430

1

This will work for strings such as '-55,00' or '5.500,00' and convert them to floats -55.00 and 5500.00, respectively.

df['colname'] = df['colname'].str.replace('.','', regex=True).str.replace(',', '.', regex=True).astype(float)

edited Jan 10 '23 at 21:34

answered Jan 10 '23 at 21:34

Robert Van Ysendyck

11
2

Convert number strings with commas in pandas DataFrame to float

4 Answers4

Linked

Related