0

I have two Dataframes of the same length which have both a 'winner' and 'loser' columns. The order of the rows in the two dataframes differ but the pair 'winner','loser' is the same in both dataframes. The problem that I am facing is that there are certain names in winner and loser that are almost similar but have some differences but i know that they correspond to the same persons. A following:

DF1

      winner     loser    winnerpts  loserpts
0     Rachel     Thomas   5           1
1     Jason      Tom      8           6
2     JenniferS. Tony     10          5 
......

DF2

      winner     loser    Agew   Agel
0     Jennifer   Tony     25     23
1     Rachel     Thomas   27     30
2     Jason      Tom R.   22     24
......

What I want to do is to transform the second Dataframe so as it has the same row order of the first one and the same names changed. As following:

      winner     loser    Agew   Agel
0     Rachel     Thomas   27     30
1     Jason      Tom      22     24
2     JenniferS. Tony     25     23
......

What I was thinking of doing was to crate a dictionary with the names of winner in the first Dataframe and the indexes in which that name is found in the Dataframe and the same for the second Dataframe. Then, change the indexes of the second Dataframe with those of the first and use the modified dictionary to reorder the Dataframe as following:

WG1 = DF1['Winner'].tolist()
WG2 = DF2['Winner'].tolist()

DF1= dict()
for i, item in enumerate(WG1):
    if item not in DF1:
        DF1[item] = [i]
    else:
        DF1[item].append(i)
DF2= dict()
for i, item in enumerate(WG2):
    if item not in DF2:
        DF2[item] = [i]
    else:
        DF2[item].append(i)

modified_D2 = {k:(DF2[k] if k in DF2 else v) for k,v in DF1.items()}

With v the indexes of the rows and k the name of the winner of course according to the first Dataframe. However, I do not know how to apply it to the second Dataframe.Is there a way to do it?

Can you please help me to change my code? Thank you in advance

stud_eco
  • 123
  • 8
  • because they are from two different datasets but corresponding to the same things. – stud_eco Mar 16 '20 at 09:29
  • I got it, need fuzzy merge – jezrael Mar 16 '20 at 09:29
  • Can this be done even if they are not on the same order? suppose I set the index of both dataframes with winner and loser. I use fuzzy merge but the indexes are not the same, does it work ? – stud_eco Mar 16 '20 at 09:34
  • Not idea, maybe you can data with dupes it to question. Also difference of data are only last upper letters with `.` ? If not, change data for [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) – jezrael Mar 16 '20 at 09:35
  • We need a rule for saying that 2 names are equivalent. And you should say whether each dataframe has consistent names. Said differently could we have both Jennifer and JenniferS. in the same dataframe? Of even worse could we have Jennifer and Jenifer for the same person? – Serge Ballesta Mar 16 '20 at 09:39
  • Then go with the approach from the dupe, fuzzy matching makes more sense @stud – yatu Mar 16 '20 at 09:44

0 Answers0