I am trying to identify specific rows in one data frame based on the rows in a second data frame . Each row in the 2nd data frame specifies a unique filter. The filter criteria (which columns to use and which values) are known only during execution and vary.
data = pd.DataFrame({'a':[0,1,2,3],'b':[4,5,6,7],'c':[9,6,4,2]})
flt = pd.DataFrame({'a': [3,None,0],'c':[None,2,5]})
The intention is to generate a search criteria dynamically which allows to use vector processing like
data[data['a']==flt['a'].iloc[0]]
data[data['c']==flt['c'].iloc[1]]
data[(data['a']==flt['a'].iloc[2]) & (data['c']==flt['c'].iloc[2])]
I was thinking about a form of meta programming or template which would generate the code on the fly potentially as string and use exec. However it seems that is a bad way to do things in Python ? The problem is that the 'real' application uses very large data frames in particular for the data to be searched O[millions by hundreds] and the combination of columns used for the search vary a lot. Between 1 and up to a dozen columns. Also flexibility and speed of search is crucial.