1

How to efficiently filter df by multiple dictionary sets. The example will be as following:

df = pd.DataFrame({'A':[10,20,20,10,20], 'B':[0,1,0,1,1], 'C':['up','down','up','down','down'],'D':[100,200,200,100,100]})

filter_sets = [{'A':10, 'B':0, 'C':'up'}, {'A':20, 'B':1, 'C':'down'}]

I only know that I can filter df by single dictionary by:

df.loc[(df[list(filter_set)] == pd.Series(filter_set)).all(axis=1)]

But is it possible to filter several dict masks at once?

** The format of filter_sets is not necessary to be something like above. If it can provide filter for multiple columns, then it is fine.

codingsnake99
  • 146
  • 1
  • 10

1 Answers1

1

Use np.logical_or.reduce with list comprehension:

mask = np.logical_or.reduce([(df[list(x)]==pd.Series(x)).all(axis=1) for x in filter_sets])

#alternative solution
mask = (pd.concat([(df[list(x)]==pd.Series(x)).all(axis=1) for x in filter_sets], axis=1)
          .any(axis=1))

df2 = df[mask]
print (df2)
    A  B     C    D
0  10  0    up  100
1  20  1  down  200
4  20  1  down  100

Or if all keys are same is possible create helper DataFrame with merge:

df2 = pd.DataFrame(filter_sets).merge(df)
print (df2)
    A  B     C    D
0  10  0    up  100
1  20  1  down  200
2  20  1  down  100
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252