Pandas replacing decimal seperator in string columns

Question

I habe a dataframe with many columns >= 50. Some of them have a comma as decimal seperator and some have commas and a few even have a little bit of both. A few are supposed to be string.

| colA | colB | colC | colD |
| 12.4 |  9,4 | 17.8 | eaui |
| 12.4 | 17,3 |  9,4 | euia |
| 13.2 | 20,7 |  9,4 | eaea |
| 10.0 |  1,8 |  2.3 | uiae |

When reading the csv some columns get recognized as float, while most are read as string. I now want to make sure both (comma decimal and dot) are recognized as string. I tried:

df2 = df.apply(lambda x: x.str.replace(',','.'))

But get the error, that this operator only works for string values.

I also tried the following, but unfortunatly also without success:

df2 = df.apply(lambda x: x.str.replace(',','.') if type(x) == str)

I found several instructions which work by choosing specific columns at a time, but I have to many to justify this.

I'm guessing there is a very easy oneliner that solves my problem, but I could not find it. Any help and tipps are appreciated!

`df.unstack().astype(str).str.replace(",", ".").unstack(level=0).astype(float)` — Corralien, Apr 16 '21 at 10:44
Is possible use [this](https://stackoverflow.com/a/56024892/2901002) ? — jezrael, Apr 16 '21 at 11:20
This is the easy way of doing it, if you have a consistent dataset. My dataset however is unfortunately not consistent. — Tom S, Apr 16 '21 at 11:47

score 2 · Accepted Answer · answered Apr 16 '21 at 10:36

2

You have to perform str.replace on a pd.Series object, i.e. a single column. You can first select the columns that are not numeric and then use apply on this sub-frame to replace the comma in each column:

string_columns = df.select_dtypes(include='object').columns
df[string_columns].apply(lambda c: c.str.replace(',', '.').astype(float), axis=1)

answered Apr 16 '21 at 10:36

gofvonx

1,370
10
20

I used the string_columns aproach, but skipped the .astype(float) for the moment. Since I also have columns, that are supposed to be strings, this would not work as far as I can tell. Instead I added a .apply(to_numeric, errors="ignore") in a later step. – Tom S Apr 16 '21 at 11:49

Pandas replacing decimal seperator in string columns

1 Answers1