0

I habe a dataframe with many columns >= 50. Some of them have a comma as decimal seperator and some have commas and a few even have a little bit of both. A few are supposed to be string.

| colA | colB | colC | colD |
| 12.4 |  9,4 | 17.8 | eaui |
| 12.4 | 17,3 |  9,4 | euia |
| 13.2 | 20,7 |  9,4 | eaea |
| 10.0 |  1,8 |  2.3 | uiae |

When reading the csv some columns get recognized as float, while most are read as string. I now want to make sure both (comma decimal and dot) are recognized as string. I tried:

df2 = df.apply(lambda x: x.str.replace(',','.'))

But get the error, that this operator only works for string values.

I also tried the following, but unfortunatly also without success:

df2 = df.apply(lambda x: x.str.replace(',','.') if type(x) == str)

I found several instructions which work by choosing specific columns at a time, but I have to many to justify this.

I'm guessing there is a very easy oneliner that solves my problem, but I could not find it. Any help and tipps are appreciated!

Tom S
  • 591
  • 1
  • 5
  • 21

1 Answers1

2

You have to perform str.replace on a pd.Series object, i.e. a single column. You can first select the columns that are not numeric and then use apply on this sub-frame to replace the comma in each column:

string_columns = df.select_dtypes(include='object').columns
df[string_columns].apply(lambda c: c.str.replace(',', '.').astype(float), axis=1)
gofvonx
  • 1,370
  • 10
  • 20
  • I used the string_columns aproach, but skipped the .astype(float) for the moment. Since I also have columns, that are supposed to be strings, this would not work as far as I can tell. Instead I added a .apply(to_numeric, errors="ignore") in a later step. – Tom S Apr 16 '21 at 11:49