I have the following dataframes:
df1 = pd.DataFrame({'col1': ['A','M','C'],
'col2': ['B','N','O'],
# plus many more
})
df2 = pd.DataFrame({'col3': ['A','A','A','B','B','B'],
'col4': ['M','P','Q','J','P','M'],
# plus many more
})
Which look like these:
df1:
col1 col2
A B
M N
C O
#...plus many more
df2:
col3 col4
A M
A P
A Q
B J
B P
B M
#...plus many more
The objective is to create a dataframe containing all elements in col4
for each col3
that occurs in one row in df1
. For example, let's look at row 1 of df1
. We see that A
is in col1
and B
is in col2
. Then, we go to df2
, and check what col4
is for df2[df2['col3'] == 'A']
and df2[df2['col3'] == 'B']
. We get, for A
: ['M','P','Q']
, and for B
, ['J','P','M']
. The intersection of these is['M', 'P']
, so what I want is something like this
col1 col2 col4
A B M
A B P
....(and so on for the other rows)
The naive way to go about this is to iterate over rows and then get the intersection, but I was wondering if it's possible to solve this via merging techniques or other faster methods. So far, I can't think of any way how.