0

I guess this has been covered already, but I can't seem to find this information on the post for merging and joining pandas (Pandas Merging 101)

I basically have 2 PANDAS and would like to merge them based on their matching records only. For instance :

Let's assume I have those 2 dataframes :
df1:

                 AQROUTES_3          ...            Indice de performance
0               Autoroute15          ...                         0.696118
1              AvenuedAnjou          ...                         1.954598
2              AvenuedAnjou          ...                         1.632500
3              AvenuedAnjou          ...                         1.831760
4           RangSaint_Andre          ...                         1.350640
5          AvenuedePicardie          ...                         2408.779

df2:

  FID                  AQROUTES_3    ...         BEARING       E_ID
751    751                AvenuedAnjou    ...      156.554001     Anjou5
723    723                AvenuedAnjou    ...      156.554001    Anjou10
692    692                AvenuedAnjou    ...      156.554001    Anjou15
12      12             RangSaint_Andre    ...      140.352997   SaintA10
1141  1141            AvenuedePicardie    ...      359.289001   Picardi5

I would like to merge them together (merging df1 to df2, thus adding df1's data in df2) while only keeping the matching records such that the output dataframe would be :

 FID         AQROUTES_3       ...   BEARING      E_ID   Indice de performance
 751    751  AvenuedAnjou     ...   156.554001  Anjou5      1.954598
 723    723  AvenuedAnjou     ...   156.554001  Anjou10     1.632500
 692    692  AvenuedAnjou     ...   156.554001  Anjou15     1.831760
 12      12  RangSaint_Andre  ...   140.352997  SaintA10    1.350640
 1141  1141  AvenuedePicardie ...   359.289001  Picardi5    2408.779

Note that the first entry of df1 (Autoroute15) has not been merged since the record did not find a match in df2.AQROUTES_3

Once again thanks for the help!

Maxime Campeau
  • 109
  • 1
  • 8

1 Answers1

0

Merge keys needs to be unique at least one side of dataframes if you want to get the intended result. The idea is to "create" a unique key in some ways. To demonstrate this, I made up E_ID column in df1:

df1

         AQROUTES_3  Indice de performance      E_ID
0       Autoroute15               0.696118       NaN
1      AvenuedAnjou               1.954598    Anjou5
2      AvenuedAnjou               1.632500   Anjou10
3      AvenuedAnjou               1.831760   Anjou15
4   RangSaint_Andre               1.350640  SaintA10
5  AvenuedePicardie            2408.779000  Picardi5

df2

    FID        AQROUTES_3     BEARING      E_ID
0   751      AvenuedAnjou  156.554001    Anjou5
1   723      AvenuedAnjou  156.554001   Anjou10
2   692      AvenuedAnjou  156.554001   Anjou15
3    12   RangSaint_Andre  140.352997  SaintA10
4  1141  AvenuedePicardie  359.289001  Picardi5

If this is the case, below code would work.

In [4]: df1['KEY'] = df1['AQROUTES_3'] + '_' + df1['E_ID'] #create a unique key
   ...: df2['KEY'] = df2['AQROUTES_3'] + '_' + df2['E_ID'] #create a unique key
   ...: df2.drop(['AQROUTES_3', 'E_ID'], axis=1, inplace=True) #no longer needed
   ...: df3 = pd.merge(df1, df2, on='KEY', how='right').drop('KEY', axis=1) #merge
   ...: df3 = df3[['FID', 'AQROUTES_3', 'BEARING', 'E_ID', 'Indice de performance']] #arrange columns
   ...: df3

Out[4]:
    FID        AQROUTES_3     BEARING      E_ID  Indice de performance
0   751      AvenuedAnjou  156.554001    Anjou5               1.954598
1   723      AvenuedAnjou  156.554001   Anjou10               1.632500
2   692      AvenuedAnjou  156.554001   Anjou15               1.831760
3    12   RangSaint_Andre  140.352997  SaintA10               1.350640
4  1141  AvenuedePicardie  359.289001  Picardi5            2408.779000

Of course this is localized solution and may not generalize if the combination of the 2 columns AQROUTES_E and E_ID are not unique, but I hope you got an idea.

Hope this helps.

gyoza
  • 2,112
  • 2
  • 12
  • 18