0

Quick Pandas question:

I cleaning up the values in individual columns of a dataframe by using an apply on a series:

# For all values in col 'Rate' over 1, divide by 100

df['rate'][df['rate']>1] = df['rate'][df['rate']>1].apply(lambda x: x/100)

This is fine when the selection criteria is simple, such as df['rate']>1. This however gets very long when you start adding multiple selection criteria:

df['rate'][(df['rate']>1) & (~df['rate'].isnull()) & (df['rate_type']=='fixed) & (df['something']<= 'nothing')] = df['rate'][(df['rate']>1) & (df['rate_type']=='fixed) & (df['something']<= 'nothing')].apply(lambda x: x/100)

What's the most concise way to: 1. Split a column off (as a Series) from a DataFrame 2. Apply a function to the items of the Series 3. Update the DataFrame with the modified series

I've tried using df.update(), but that didn't seem to work. I've also tried using the Series as a selector, e.g. isin(Series), but I wasn't able to get that to work either.

Thank you!

cs95
  • 379,657
  • 97
  • 704
  • 746
ZAR
  • 2,550
  • 4
  • 36
  • 66

2 Answers2

2

When there are multiple conditions, you can keep things simple using eval:

mask = df.eval("rate > 1 & rate_type == 'fixed' & something <= 'nothing'")
df.loc[mask, 'rate'] = df['rate'].apply(function)

Read more about evaluating expressions dynamically here. Of course, this particular function can be vectorized as

df.loc[mask, 'rate'] /= 100
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thank you! Let me ask, how might we accomplish this when several of the conditions use Series methods, such as .isnull()? – ZAR Jun 27 '19 at 23:05
  • 1
    @ZAR `df.eval('col.isnull() & col2 > 1 & ...', engine='python')` works for me. – cs95 Jun 28 '19 at 00:27
1

It will work with update

con=(df['rate']>1) & (df['rate_type']=='fixed') & (df['something']<= 'nothing')
df.update(df.loc[con,['rate']].apply(lambda x: x/100))
BENY
  • 317,841
  • 20
  • 164
  • 234