1

Consider this sample data frame df_1:

index    value_1

1 -3.570,00 2 +552,76 3 -1,01 4 -100.234,01

where the float values are signed and Europian delimiters/separators have been used:

  • comma , for decimal
  • dot/point/period . for thousands

I want to convert the values of this column to float. If I try the instruction from here

# tag 1
df_1['value_1'] = df_1['value_1'].apply(pd.to_numeric)

I get the error message

ValueError: Unable to parse string "<...>" at position <...>

I could use the instruction from here

# tag 2
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace('.',''))
df_1['value_1'] = df_1['value_1'].apply(lambda x: x.replace(',','.'))

prior to # tag 1, however, I get the message:

C:\Users\userName\AppData\Local\Temp\ipykernel_11992\3059588848.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
...

even though this workaround does the job, I wonder if there is a more canonical way to achieve what I want without getting any warnings?

Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
  • @wjandrea any way to remove the `SettingWithCopyWarning` warning too? – Foad S. Farimani May 28 '22 at 20:41
  • 1
    There's an existing question about that: [How to deal with SettingWithCopyWarning in Pandas](/q/20625582/4518341). I don't have any experience with it myself, but I just googled "pandas SettingWithCopyWarning" and that looks promising. – wjandrea May 28 '22 at 20:45
  • @wjandrea yeah, had seen that page. couldn't comprehend it to a concise answer to my question though. – Foad S. Farimani May 28 '22 at 20:50
  • Oh actually, I can't reproduce the issue. Instead I get `AttributeError: 'str' object has no attribute 'str'` at `x.str`. If I switch `.apply(lambda x: x.str.replace(...))` for `.str.replace(...)`, then I don't get any warning, but maybe my kernel is set up differently, IDK. – wjandrea May 28 '22 at 21:00
  • @wjandrea true. There was a mistake in the [original answer](https://stackoverflow.com/a/40083822/4999991). please check the latest edit above. – Foad S. Farimani May 28 '22 at 21:04
  • 1
    I still don't get a warning. Even with `pd.options.mode.chained_assignment = 'raise'`, nothing. I'm using Pandas 1.4.2 if that's relevant. – wjandrea May 28 '22 at 21:08
  • @wjandrea hmmm are you on Windows OS? I am also using Jupyter. – Foad S. Farimani May 28 '22 at 21:12
  • 1
    Oh wait, I'm trying a chained assignment and it's failing silently. This seems to be an issue on my side. – wjandrea May 28 '22 at 21:21

2 Answers2

2

Check out the locale module. Documentation

Example:

import locale
locale.setlocale(locale.LC_NUMERIC, 'eu')

df.value_1 = df.value_1.apply(locale.atof)
print(df)

Output:

     value_1
0   -3570.00
1     552.76
2      -1.01
3 -100234.01
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
1

If you're reading from CSV, you can use the decimal and thousands parameters:

df = pd.read_csv(..., decimal=',', thousands='.')

From the documentation:

thousands : str, optional

Thousands separator.

decimal : str, default ‘.’

Character to recognize as decimal point (e.g. use ‘,’ for European data).

Due credit to atomh33ls for posting almost exactly this on another question.

wjandrea
  • 28,235
  • 9
  • 60
  • 81