Filtering rows in a DataFrame when condition is rows in another DataFrame

Question

I have a huge timeseries DataFrame (about 100 000 000 rows) and I need to filter rows by conditions. Each condition there are in an each row of an another Dataframe. This Dataframe has about 2000 rows and each row is condition. Toy example:

df = pd.DataFrame({val: [1, 3, 2, 4, 3, 1, 2, 3], date: [2015-03-12, 2015-04-12, 2015-05-13, 2016-03-12, 2016-04-07, 2016-05-12, 2017-01-11, 2017-03-20]})
df_condition = pd.DataFrame({val: [2, 3], date: [2015-07-13, 2016-04-08]})

Condition is remove all raws in df, where val appears earlier than date in df_condition:

df = df[(df['val']==2) & (df['date']>'2015-07-13')]
df = df[(df['val']==3) & (df['date']>'2016-04-08')]

and so on about 2000 conditions

I use this way, but it too long (about 5 hours). Is there a faster method?

vals = df_condition.val.values
dates = vals = df_condition.dates.values
for i in range(len(df_condition)):
    df = df[~((df.val==vals[i])&(df.date < dates[i]))]

score 2 · Accepted Answer · answered Dec 22 '17 at 09:29

2

I believe you can create list of all masks and then reduce:

masks = [(df.val==x.val)&(df.date >= x.date) for x in df_condition.itertuples()]
print (masks)
[0    False
1    False
2    False
3    False
4    False
5    False
6     True
7    False
dtype: bool, 0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
dtype: bool]

df1 = df[np.logical_and.reduce(masks)]

answered Dec 22 '17 at 09:29

jezrael

822,522
95
1,334
1,252

Thanks for so elegant and fast way! – Dmitrii Petukhov Dec 22 '17 at 10:21

Filtering rows in a DataFrame when condition is rows in another DataFrame

1 Answers1