0

I have a DataFrame with a column containing currency symbol, I'm looking for an efficient way stripping those symbols so I can convert the column's data type to floats.

Data Frame:

          Date     Money(ILS)   
0   2020-05-02     ₪77,000.00   
1   2020-04-30     ₪80,600.00   
2   2020-07-29     ₪86,600.00     
3   2020-10-27    ₪113,963.00     
4   2021-01-25    ₪134,963.00    
5   2021-04-25    ₪155,963.00     
6   2021-07-24    ₪176,963.00     
7   2021-10-22    ₪197,963.00     
8   2022-01-20    ₪218,963.00     
9   2022-04-20    ₪239,963.00     
10  2022-07-19    ₪260,963.00  

I'm trying to strip the ₪ symbol from the money column.

my function:

@classmethod
    def strip_symbols(cls):
        cls.df = cls.df.apply(lambda x: x.str.strip('₪') if x.dtype == TypesConsts.OBJECT else x)
        return cls.df

with this method I get the following error:

AttributeError: Can only use .str accessor with string values!
  • 1
    As a side note, you probably shouldn't use a float for money. https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency – Holden Rohrer May 17 '20 at 06:47

2 Answers2

0

Use DataFrame.select_dtypes for get columns only of object dtype:

cols = cls.df.select_dtypes(TypesConsts.OBJECT).columns
cls.df[cols] = cls.df[cols].apply(lambda x: x.str.strip('₪'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Apparently, you applied your lambda function to the whole DataFrame (all columns), so in case of any column of type other than object you have an exception.

Actually you should apply this function only to the column (or columns) holding currency content.

But there is other potential source of problem: If any column has object type, then usually all its elements contain string data, but there can be "malicious" cases when some elements contain e.g. float data (float is still a descendant of object).

Try such an example: Set one element of your DataFrame (Money(ILS) column) to a float value:

df.iloc[2,1] = 86600.15

Then run the code that I suggest (to strip only):

df['Money(ILS)'] = df['Money(ILS)'].apply(
    lambda x: x.strip('$') if type(x).__name__ == 'str' else x)

(my test data contains $ as the currency symbol, so I used just '$').

The result is:

        Date  Money(ILS)  Other
0 2020-05-02   77,000.00   Abcd
1 2020-04-30   80,600.00  23.16
2 2020-07-29     86600.1   Efgh
3 2020-10-27  113,963.00   Xxxx
4 2021-01-25  134,963.00     35
5 2021-04-25  155,963.00   Yyyy

​(I added an additional column and used less rows).

Now:

  • all other columns have their original values,
  • the currency symbol has been stripped only from cells in Money(ILS) column,
  • but only from those which are of string type,
  • the only float value has been printed as 86600.1 - without the final "5", but it is only printing inaccuracy (when you retrieve this cell with iloc, you will get the proper value).

But to convert this column to float you have also to drop commas (beforehand), also only in actual strings:

df['Money(ILS)'] = df['Money(ILS)'].apply(
    lambda x: x.replace(',', '') if type(x).__name__ == 'str' else x)

The final solution

You don't need to do these conversions stepwise. Just use the following code, doing all in one go:

df['Money(ILS)'] = df['Money(ILS)'].apply(lambda x: float(
    x.strip('$').replace(',', '') if type(x).__name__ == 'str' else x))

Now when you print the DataFrame:

  • Money(ILS) is of float type (check it with df.info()),
  • even the float value present before has .15 ending, just as it should be.

Of course, apply this code also to another "currency" columns, if any.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41