Short circuit numpy logical_and on pandas series

Question

I create a mask to use in a pandas dataframe:

 mask = np.logical_and(
                csv_df['time'].map(operator.attrgetter('hour')).isin(
                    hours_set),
                csv_df['time'].map(lambda x: x.weekday_name[:3]).isin(
                    days_set))
csv_df = csv_df.loc[mask, :]

Turns out the calculation of the two isin Series is rather slow. The way above it calculates both Series and then adds them - is there an (idiomatic) way to short circuit per element, as the first series is mostly false so we won't need to calclulate the other series' element?

Related topic https://stackoverflow.com/q/45771554/901925 – hpaulj May 14 '18 at 11:02 — hpaulj, May 14 '18 at 11:02

score 1 · Accepted Answer · edited May 14 '18 at 11:17

1

One idea is:

mask = csv_df['time'].dt.hour.isin(hours_set) & 
       csv_df['time'].dt.strftime('%a').isin(days_set)

Anoather idea if most values not match is filter first one and then second:

csv_df1 = csv_df.loc[csv_df['time'].dt.strftime('%a').isin(days_set)]
csv_df2 = csv_df1.loc[csv_df1['time'].dt.hour.isin(hours_set)]

edited May 14 '18 at 11:17

Mr_and_Mrs_D

32,208
39
178
361

answered May 14 '18 at 08:13

jezrael

822,522
95
1,334
1,252

Thanks for the tricks - but this IIRC will still calculate both series due to python's eager evaluation of expressions – Mr_and_Mrs_D May 14 '18 at 08:18
@Mr_and_Mrs_D - hmmm, I think i understand you, but not sure if supported in numpy/pandas it. Because need looking for different sets of values --- hours and days. – jezrael May 14 '18 at 08:20
Nifty trick - since the first array will be calculated anyway – Mr_and_Mrs_D May 14 '18 at 11:18

Short circuit numpy logical_and on pandas series

1 Answers1