4

I have a column that has entries that look like this

df
      price
0  42000SEK
1   1200EUR
2   2200EUR

I want to convert the entires that have Euros to SEK(swedish krona) and remove both SEK and EUR so i end up with this. (lets for simplicity say 1 EUR = 10 SEK)

['42000']
['12000']
['22000']

With this line of code i can find the rows that has Euros.

df[df.price.str.contains('EUR')] 

but i dont know how to continue from here.

I could use this answer

Pandas DataFrame currency conversion

if i could split on first letter in the cell but i cant figure out how to do that.

cs95
  • 379,657
  • 97
  • 704
  • 746
JKnecht
  • 231
  • 2
  • 16
  • I don't think the assumption of the currency conversion is sensible here. so, the first part would be to split out the amount, and then do a second pass to get the actual ammount – roganjosh Dec 19 '20 at 22:54
  • see [this](https://stackoverflow.com/q/37683558/4799172) as the starting point for part A – roganjosh Dec 19 '20 at 22:56

4 Answers4

3

Let's try replace and pd.eval:

df["price"].replace({"SEK": "*1", "EUR": "*10"}, regex=True).map(pd.eval)

Output:

0    42000
1    12000
2    22000
Name: price, dtype: int64

This works nicely assuming you have no NaNs, and that there are only two currencies with one of them needing conversion. If you do have NaNs, fill them first. Finally, assign this back to the column to update the DataFrame.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • I get this weird error: UndefinedVariableError: name 'price' is not defined – JKnecht Dec 19 '20 at 23:43
  • @JKnecht I have no idea what you copy pasted but you won't get that error if you run this code correctly and assign the result back. Please just copy the first line and assign the result back to `df['price'] = ...` and report back to me when you've tried that successfully. – cs95 Dec 19 '20 at 23:45
  • Yes your code should work, there is something mysterious with the data. I need to clean it some more. Ill get back to you. Thanks btw. – JKnecht Dec 20 '20 at 00:06
  • more on `pd.eval`: https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval – David Erickson Dec 20 '20 at 00:32
  • 1
    @DavidErickson thanks for sharing! Oh wow, I think I know the dude who wrote that – cs95 Dec 20 '20 at 00:38
  • This worked perfectly after doing some data cleaning, getting rid of strings and other garbage that shouldnt be in the column. – JKnecht Dec 20 '20 at 14:03
3

Original dataframe

df = pd.DataFrame(data=['42000SEK','1200EUR','2200EUR'],columns=['price'])
price
42000SEK
1200EUR
2200EUR

Currency conversion dataframe

cc = pd.DataFrame({'from':['EUR','SEK'],'to':['SEK','EUR'],'rate':[10,.1]})
from to rate
EUR SEK 10
SEK EUR .1

(may have dozens of currency pairs)

Input fields separation

df['value'] = df['price'].str.slice(stop=-3).astype(int)

df['currency'] = df['price'].str.slice(start=-3)
price value currency
42000SEK 42000 SEK
1200EUR 1200 EUR
2200EUR 2200 EUR

Applying conversion to EUR:

df['price in EUR'] = df['value']*df.merge(cc[cc.to=='EUR'].append({'from':'EUR','to':'EUR','rate':1},ignore_index=True),left_on='currency',right_on='from',how='left')['rate']
price value currency price in EUR
42000SEK 42000 SEK 4200
1200EUR 1200 EUR 1200
2200EUR 2200 EUR 2200

Explanation:

  • take only the conversion pairs to EUR: cc[cc.to=='EUR']

  • ensure that there will be self-conversion, otherwise we won't have the lines that were EUR already: .append({'from':'EUR','to':'EUR','rate':1},ignore_index=True)

  • join prices dataframe with conversion rates in a temporary column, note that the column currency is the same as the temporary column "from": df.merge(...,left_on='currency',right_on='from',how='left')['rate']

  • multiply this temporary rate column with the prices column and append as a new column named 'price in EUR': df['price in EUR'] = df['value']*...

Now it is possible to convert many currencies to EUR, all you need is to feed the currency conversion dataframe (cc).

brunoff
  • 4,161
  • 9
  • 10
  • I really like this approach, it's smart and adaptable to other scenarios. The only thing I would change: You append the conversion from EUR to EUR (1:1) inside the conversion-formula - I would just add it to the initial data frame to keep the conversion formula clean. – n.r. Nov 21 '22 at 14:10
1

You could do:

import pandas as pd

# setup
df = pd.DataFrame(data=['42000SEK', '1200EUR', '2200EUR'], columns=['price'])

# create multiplier
mult = df['price'].str.contains('EUR$').map({ True : 10, False : 1})

# extract values
values = df['price'].str.replace(r'\D+', '').astype(int)

df['price'] = values * mult
print(df)

Output

   price
0  42000
1  12000
2  22000

For creating the multiplier column, you could use np.where:

import numpy as np

# create multiplier
mult = np.where(df['price'].str.contains('EUR$'), 10, 1)

If you want to keep all the info I suggest you use str.extract with a regex for extracting the value and the currency, for example:

# extract multiplier and value, concat with existing df
res = pd.concat((df, df['price'].str.extract(r'(?P<value>\d+)(?P<currency>\D+)')), 1)

# create result and put it a column named total
res['total'] = res['value'].astype(int) * np.where(res['currency'].eq('EUR'), 10, 1)
print(res)

Output

      price  value currency  total
0  42000SEK  42000      SEK  42000
1   1200EUR   1200      EUR  12000
2   2200EUR   2200      EUR  22000
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • On values = df['price'].str.replace(r'\D+', '').astype(int) i get the error ValueError: cannot convert float NaN to integer - but df['price'].isna().sum() is equal to zero? – JKnecht Dec 19 '20 at 23:39
  • @JKnecht Try: `values = df['price'].str.replace(r'\D+', '').fillna(0).astype(int)` – Dani Mesejo Dec 19 '20 at 23:41
  • Now i did not get an error but now df"price"].value_counts() gives something very confusing. Your code should work but value_counts() now give me 3.000000e+12 265 5.000000e+12 252 etc, they are wrong by very much. (to not confuse you here df["price"] does not have three entries as in the question) But these values are 100.000.000 larger than they should be. – JKnecht Dec 19 '20 at 23:57
  • Could you add more meaningful examples to your question? I don't think there's anything wrong with the code ... – Dani Mesejo Dec 20 '20 at 00:01
  • 1
    Yes your code should work, there is something mysterious with the data. I need to clean it some more. Ill get back to you. Thanks btw. – JKnecht Dec 20 '20 at 00:06
0

If you want to retain the amount and currency, you can use regular expressions to extract the number part and non-number part separately:

df = pd.DataFrame({'price': ['42000SEK', '1200EUR', '2200EUR']})
df['amount'] = df.price.str.extract(r'(\d+)')
df['currency'] = df.price.str.extract(r'([^\d]+)')
print(df)

Output

      price amount currency
0  42000SEK  42000      SEK
1   1200EUR   1200      EUR
2   2200EUR   2200      EUR
    
Matthew Cox
  • 1,047
  • 10
  • 23