Merge pandas data frames based on two columns with the same pair of values but displayed in different orders in the two dataframes

Question

I have two pandas dataframes. They have the same winner and loser name pairs but their are in different orders in the two dataframes.

DF1

     Winner       Loser         RankW   RankL

0    Fleishman Z.  Calleri A.   170.0   26.0
1    Roddick A.   Tsonga J.W.   7.0     212.0
2    Gasquet R.   Volandri F.   17.0    45.0

DF2

     Winner       Loser         WHand   LHand

0    Gasquet R.   Volandri F.   R       R  
1    Fleishman Z.  Calleri A.   L       R
2    Roddick A.   Tsonga J.W.   R       R

I want to merge them in one single Dataframe, however, whenever I try it I get additional rows. What I want to get is:

     Winner       Loser         RankW   RankL    WHand   LHand      

0    Fleishman Z.  Calleri A.   170.0   26.0     L       R
1    Roddick A.   Tsonga J.W.   7.0     212.0    R      R
2    Gasquet R.   Volandri F.   17.0    45.0     R      R

Thus, I want to merge them following the order of the pairs in DF1 but adding the corrisponing values of WHand and LHand in DF2.

I know that all pairs correspond because I tried to determine the rows in DF1 which were not in DF2 but there are none.

names = DF2[['Winner','Loser']]

df = DF1.merge(names, on=['Winner','Loser'],how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']

len(df)
Out: 0

could you show a more representative example? `df1.merge(df2 ,on =['Winner','Loser'])` works with these dataframe — ansev, Mar 17 '20 at 18:45
related [question](https://stackoverflow.com/questions/58592606/find-symmetric-pairs-quickly-in-numpy/58592764#58592764) — Quang Hoang, Mar 17 '20 at 18:46
@ansev the real dataframes have thousands of data but both of them have 5588 rows and the merge results in 7120 rows — stud_eco, Mar 17 '20 at 18:53
this happens simply because there are duplicates in df1 and / or df2 maybe you have duplicate whole rows or rows that you don't need, if you need all the rows then it is logical that when using merge more rows will be produced. I think a clear example of the problem is needed — ansev, Mar 17 '20 at 18:56
@ansev the only demonstration that I can give it is in the edit, thus that the pairs are the same in both dataframes — stud_eco, Mar 17 '20 at 19:00
yes but try `df1.duplicated(['Winner','Loser']).any()` and `df2.duplicated(['Winner','Loser']).any()` if true that is the reason why there are more rows when you use merge — ansev, Mar 17 '20 at 19:02
@ansev you were right! It gives me true in both cases, you know then how I can solve this problem please? — stud_eco, Mar 17 '20 at 19:04
try: `df1.assign(index2 = df1.groupby(['Winner','Loser']).cumcount()).merge(df2.assign(index2 = df2.groupby(['Winner','Loser']).cumcount()), on = ['Winner','Loser', 'index2']).drop(columns = 'index2')`, or symply `.drop_duplicates` at the end `df1.merge(df2 ,on =['Winner','Loser']).drop_duplicates()` — ansev, Mar 17 '20 at 19:05
you have to understand how merge works and why there are duplicates and then decide, if I were you I would create two simple datframes with repeated rows and smaller and I would use merge, I would study the result to know how merge works and then once understood I would decide what is best for my working dataframe — ansev, Mar 17 '20 at 19:08
Thank you very much! It works perfectly. How can I give you points for this answer? I'm not very practic with it! — stud_eco, Mar 17 '20 at 19:10

score 0 · Accepted Answer · answered Mar 17 '20 at 19:14

Use:

(df1.assign(index2 = df1.groupby(['Winner','Loser']).cumcount())
    .merge(df2.assign(index2 = df2.groupby(['Winner','Loser']).cumcount()),
           on = ['Winner','Loser', 'index2'])
    .drop(columns = 'index2'))

or drop_duplicates

df1.merge(df2 ,on =['Winner','Loser']).drop_duplicates()

Merge pandas data frames based on two columns with the same pair of values but displayed in different orders in the two dataframes

1 Answers1