0

I am trying to identify specific rows in one data frame based on the rows in a second data frame . Each row in the 2nd data frame specifies a unique filter. The filter criteria (which columns to use and which values) are known only during execution and vary.

data = pd.DataFrame({'a':[0,1,2,3],'b':[4,5,6,7],'c':[9,6,4,2]})

flt = pd.DataFrame({'a': [3,None,0],'c':[None,2,5]})

The intention is to generate a search criteria dynamically which allows to use vector processing like

data[data['a']==flt['a'].iloc[0]]

data[data['c']==flt['c'].iloc[1]]

data[(data['a']==flt['a'].iloc[2]) & (data['c']==flt['c'].iloc[2])]

I was thinking about a form of meta programming or template which would generate the code on the fly potentially as string and use exec. However it seems that is a bad way to do things in Python ? The problem is that the 'real' application uses very large data frames in particular for the data to be searched O[millions by hundreds] and the combination of columns used for the search vary a lot. Between 1 and up to a dozen columns. Also flexibility and speed of search is crucial.

cs95
  • 379,657
  • 97
  • 704
  • 746
bernddude
  • 31
  • 2
  • Perhaps you'd be interested in [Dynamic Expression Evaluation in pandas using pd.eval()](https://stackoverflow.com/questions/53779986/dynamic-expression-evaluation-in-pandas-using-pd-eval) – cs95 Feb 17 '19 at 23:39
  • Thank you very much for your response, i just found that one. This looks promising, thx – bernddude Feb 18 '19 at 01:11

0 Answers0