Pandas column selection with many conditions becomes unwieldy

Question

Quick Pandas question:

I cleaning up the values in individual columns of a dataframe by using an apply on a series:

# For all values in col 'Rate' over 1, divide by 100

df['rate'][df['rate']>1] = df['rate'][df['rate']>1].apply(lambda x: x/100)

This is fine when the selection criteria is simple, such as df['rate']>1. This however gets very long when you start adding multiple selection criteria:

df['rate'][(df['rate']>1) & (~df['rate'].isnull()) & (df['rate_type']=='fixed) & (df['something']<= 'nothing')] = df['rate'][(df['rate']>1) & (df['rate_type']=='fixed) & (df['something']<= 'nothing')].apply(lambda x: x/100)

What's the most concise way to: 1. Split a column off (as a Series) from a DataFrame 2. Apply a function to the items of the Series 3. Update the DataFrame with the modified series

I've tried using df.update(), but that didn't seem to work. I've also tried using the Series as a selector, e.g. isin(Series), but I wasn't able to get that to work either.

Thank you!

score 2 · Answer 1 · answered Jun 27 '19 at 23:03

2

When there are multiple conditions, you can keep things simple using eval:

mask = df.eval("rate > 1 & rate_type == 'fixed' & something <= 'nothing'")
df.loc[mask, 'rate'] = df['rate'].apply(function)

Read more about evaluating expressions dynamically here. Of course, this particular function can be vectorized as

df.loc[mask, 'rate'] /= 100

answered Jun 27 '19 at 23:03

cs95

379,657
97
704
746

Thank you! Let me ask, how might we accomplish this when several of the conditions use Series methods, such as .isnull()? – ZAR Jun 27 '19 at 23:05
1

@ZAR `df.eval('col.isnull() & col2 > 1 & ...', engine='python')` works for me. – cs95 Jun 28 '19 at 00:27

score 1 · Accepted Answer · answered Jun 27 '19 at 23:05

1

It will work with update

con=(df['rate']>1) & (df['rate_type']=='fixed') & (df['something']<= 'nothing')
df.update(df.loc[con,['rate']].apply(lambda x: x/100))

answered Jun 27 '19 at 23:05

BENY

317,841
20
164
234

Pandas column selection with many conditions becomes unwieldy

2 Answers2