1

Assume the dataframes df_1 and df_2 below, which I want to merge "left".

df_1= pd.DataFrame({'A': [1,2,3,4,5],
                    'B': [10,20,30,40,50]})
df_2= pd.DataFrame({'AA': [1,5],
                    'BB': [10,50],
                    'CC': [100, 500]})
>>> df_1
   A   B
0  1  10
1  2  20
2  3  30
3  4  40
4  5  50

>>> df_2
   AA  BB   CC
0   1  10  100
1   5  50  500

I want to perform a merging which will result to the following output:

   A   B     CC
0  1  10  100.0
1  2  20    NaN
2  3  30    NaN
3  4  40    NaN
4  5  50  500.0

So, I tried pd.merge(df_1, df_2, left_on=['A', 'B'], right_on=['AA', 'BB'], how='left') which unfortunately duplicates the columns upon which I merge:

   A   B   AA    BB     CC
0  1  10  1.0  10.0  100.0
1  2  20  NaN   NaN    NaN
2  3  30  NaN   NaN    NaN
3  4  40  NaN   NaN    NaN
4  5  50  5.0  50.0  500.0

How do I achieve this without needing to drop the columns 'AA' and 'BB'?

Thank you!

Newbielp
  • 431
  • 3
  • 16
  • Similar question: https://stackoverflow.com/questions/22208218/pandas-merge-columns-but-not-the-key-column – Ch3steR Oct 21 '20 at 09:58
  • I found the second answer (of aichao) of this post very useful. It was something I have been looking for... https://stackoverflow.com/questions/40570143/pandas-join-on-columns-with-different-names I place it here in case someone needs to have a look. – Newbielp Nov 23 '20 at 14:17

2 Answers2

4

You can use rename and join by A, B columns only:

df = pd.merge(df_1, df_2.rename(columns={'AA':'A','BB':'B'}), on=['A', 'B'], how='left') 
print (df)
   A   B     CC
0  1  10  100.0
1  2  20    NaN
2  3  30    NaN
3  4  40    NaN
4  5  50  500.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is just an example, my real dataframe has a lot more columns and renaming is very time-consuming. Thanks... – Newbielp Oct 21 '20 at 13:52
3

In pd.merge's right_on parameter accepts array-like argument.

Can also be an array or list of arrays of the length of the right DataFrame. These arrays are treated as if they are columns.

df_1.merge(
    df_2["CC"], left_on=["A", "B"], right_on=[df_2["AA"], df_2["BB"]], how="left"
)

  A   B     CC
0  1  10  100.0
1  2  20    NaN
2  3  30    NaN
3  4  40    NaN
4  5  50  500.0
  • df.merge(sec_df) give sec_df without column names you want to merge on.
  • rigth_on as a list with columns you want to merge, [df_2['AA'], df_2['BB']] is equivalent to [*df_2[['AA', 'BB']].to_numpy()]

IMHO this method is cumbersome. As @jezrael posted renaming columns and merging them is pythonic/pandorable

Ch3steR
  • 20,090
  • 4
  • 28
  • 58