I'm new to python and I wanted to do this particular task which doesn't seem obvious to me how to do it. I don't even know what to search in order to find it. First here is the code snippet and I'll explain what I'm aiming for below it:
import pandas as pd
mycolumns = ['col1', 'col2', 'col3']
df = pd.DataFrame(data=[[**1**,2,3,**1**,5,6],[1,2,3,4,5,6]],
columns=['col1_l', 'col2_l', 'col3_l', 'col1_r', 'col2_r', 'col3_r'])
criteria = list()
for col in mycolumns :
criterion = (df[col + '_l'] == df[col + '_r'])
criteria.append(criterion)
df = df[criteria[0] | criteria[1] | ... | criteria[5]]
print df
Output:
col1_l col2_l col3_l col1_r col2_r col3_r
0 1, 2, 3, 1, 5, 6
What I want is to be able to select the dataframe rows that meet all the specified criteria, but the problem is that the number of columns is not fixed, each run could have different number of columns and I want to do the same each time I execute this. Question is, how can I write this line:
df = df[criteria[0] | criteria[1] | ... | criteria[5]]
Keep in mind that the dataframe is obtained from a join sql query over a database, I just wrote this example dataframe for clarification. Thank you and pardon me if this was obvious.