2

I'm new to python and I wanted to do this particular task which doesn't seem obvious to me how to do it. I don't even know what to search in order to find it. First here is the code snippet and I'll explain what I'm aiming for below it:

import pandas as pd

mycolumns = ['col1', 'col2', 'col3']

df = pd.DataFrame(data=[[**1**,2,3,**1**,5,6],[1,2,3,4,5,6]], 
                  columns=['col1_l', 'col2_l', 'col3_l', 'col1_r', 'col2_r', 'col3_r'])

criteria = list()
for col in mycolumns :
     criterion = (df[col + '_l'] == df[col + '_r'])
     criteria.append(criterion)

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

print df

Output:

    col1_l  col2_l  col3_l  col1_r  col2_r col3_r
0     1,      2,     3,      1,      5,     6

What I want is to be able to select the dataframe rows that meet all the specified criteria, but the problem is that the number of columns is not fixed, each run could have different number of columns and I want to do the same each time I execute this. Question is, how can I write this line:

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

Keep in mind that the dataframe is obtained from a join sql query over a database, I just wrote this example dataframe for clarification. Thank you and pardon me if this was obvious.

Hasan Jawad
  • 197
  • 9

1 Answers1

3

Use np.logical_or.reduce:

print (df[np.logical_or.reduce(criteria)])
   col1_l  col2_l  col3_l  col1_r  col2_r  col3_r
0       1       2       3       1       5       6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252