2

I have this dataframe

   Col1              Col2

0  A (1000 EUR)  C ( 3000 USD)

1  B (2000 CHF)  D ( 4000 GBP)

I would like to convert it to

   Col1  Col2

0  1000  3000

1  2000  4000

I know how to create a dataframe (with indexes) for 1 column, but not for multiple columns

This code produces this result

   Col1

0  1000

1  2000 

a = z['Col1'].str.split('(').str[-1].str.split().str[0].apply(pd.to_numeric,errors='coerce')

how can I amend the code above to also add col2 (ideally using vectorisation rather than iteration) (so ideally I wouln't want to have to enter the same code for every column)

yatu
  • 86,083
  • 12
  • 84
  • 139
zaydar
  • 161
  • 1
  • 3
  • 7

4 Answers4

5

You could use str.extract on each column and use pd.concat to build a new dataframe from the result:

x = np.concatenate([df[col].str.extract(r'(?<=\()\s*(\d+)') for col in df], axis=1)
pd.DataFrame(x, columns=df.columns)

   Col1  Col2
0  1000  3000
1  2000  4000
yatu
  • 86,083
  • 12
  • 84
  • 139
3

I would use df.applymap. The difference from df.apply is that df.apply takes a whole series (row or column) as an argument and applies a function where df.applymap takes every element of the DataFrame and applies a function. Try this:

df = df.applymap(lambda x: pd.to_numeric(str(x).split('(')[1].split()[0], errors='coerce'))

print(df)

   Col1  Col2
0  1000  3000
1  2000  4000

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Col1    2 non-null int64
Col2    2 non-null int64
dtypes: int64(2)
memory usage: 48.0 bytes
None
pnovotnyq
  • 547
  • 3
  • 12
1

You can use the apply function to apply your operation to all elements in both rows.

# creates your dataframe
df = pd.DataFrame({'Col1':['A (1000 EUR)','B (2000 CHF)'], 'Col2':['C (3000 USD)', 'D (4000 GBP)']})

# use the apply function to  apply your code to all elements of both columns
df = df.apply(lambda x: x.str.split('(').str[-1].str.split().str[0].apply(pd.to_numeric,errors='coerce'))

Does the trick for me

ABot
  • 197
  • 12
0

You can use the applymap method of a pandas dataframe. It would look something like:

import re

function_to_apply = lambda x: re.search("[0-9]+", x).group()
your_dataframe.applymap(function_to_apply)

Possible something like that would work in your case, and would certainly be a bit simpler!

pnovotnyq
  • 547
  • 3
  • 12
PirateNinjas
  • 1,908
  • 1
  • 16
  • 21