1

I am running into the following issue. I want to merge two data frames on multiple columns. Example data frames are as follows: df1:

df1:
playerA   playerB  Weight
kim         lee      1
jackson     kim      3
dan         lee      4


df2:
name1   name2      score
dan         lee      11
dan         kim      23
jackson     kim      44
lee         kim      35
Jackson     lee      26
lee         dan      27

I want to merge this two dataframe based on two columns. df1[['playerA','playerB']],df2[['name1','name2']]

But the problem is, I want to merge this two data frame without considering the order of columns.

I tried

result = pd.merge(df1, df2, on =df1[['playerA','playerB']],df2[['name1','name2']])

But it didn't work.

Here is what I Want,

df1:
playerA   playerB  Weight score
kim         lee      1    
jackson     kim      3
dan         lee      4

I want to merge df1 and df2 by using two columns from each data frame without considering the order 
(df1[['playerA','playerB']],df2[['name1','name2']])

df2_merge:
name1   name2      score  weight
dan         lee      11    4
kim         lee      23    1
jackson     kim      44    3
lee         kim      35    1
kim        jacson     26   3
lee         dan      27    4

And then I want to delete the row which has the same name list.

Thus, here is my final wishes

df2_merge_ final:
name1   name2      score  weight
dan         lee      11    4
kim         lee      23    1
jackson     kim      44    3

I am not sure what else to try. Any advice? Thanks :)

1 Answers1

2

You can add new columns filled by sorted values and then use merge with remove unnecessary columns:

df1[['a','b']] = np.sort(df1[['playerA','playerB']], axis=1)
df2[['a','b']] = np.sort(df2[['name1','name2']], axis=1)

result = pd.merge(df1, df2, on=['a','b']).drop(['a','b','name1','name2'], axis=1)
print (result)
   playerA playerB  Weight  score
0      kim     lee       1     35
1  jackson     kim       3     44
2      dan     lee       4     11
3      dan     lee       4     27

If need first unique combination add DataFrame.drop_duplicates:

df1[['a','b']] = np.sort(df1[['playerA','playerB']], axis=1)
df2[['a','b']] = np.sort(df2[['name1','name2']], axis=1)

df2 = df2.drop_duplicates(['a','b'])

result = pd.merge(df1, df2, on=['a','b']).drop(['a','b','name1','name2'], axis=1)
print (result)
   playerA playerB  Weight  score
0      kim     lee       1     35
1  jackson     kim       3     44
2      dan     lee       4     11
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252