0

So basically I have 2 dataframes with the below structure

PATH     CUID     NAME     Author       Updated     
1        1        Test     Test         01/01/2021
2        2        Test1    Test         01/01/2021    
PATH     CUID     NAME     Author       Updated
1        1        Test     Test 2       01/02/2021
3        3        Test2    Test 2       01/02/2021

Is it possible to merge using 3 key columns? I know the code for merging using 2 columns, but unsure if a 3rd column could be added.

I'm doing an outer join so that all values will be present, only the Author(x or y) and Updated(x or y) will be nulled if there isnt a match on both dataframes.

Sample output would be:

PATH     CUID     NAME     Author_x     Updated_x     Author_y     Updated_y
1        1        Test     Test         01/01/2021    Test 2       01/02/2021
2        2        Test1    Test         01/01/2021    
3        3        Test2                               Test 2       01/02/2021

2 Answers2

0

yes you can merge on more than 2 keys by passing a list if columns in both dataframe have the same name then use on=['columns1','columns2','columns3']

else use left_on=['columns1','columns2','columns3'], right_on=['columns1','columns2','columns3']

see doc in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

gal peled
  • 467
  • 5
  • 8
0

Try pandas.merge():

import pandas as pd

pd.merge(df1, df2, on=["PATH", "CUID", "NAME"], how="outer")
     PATH    CUID     NAME Author_x   Updated_x Author_y   Updated_y
  <int64> <int64> <object> <object>    <object> <object>    <object>
0       1       1     Test     Test  01/01/2021   Test 2  01/02/2021
1       2       2    Test1     Test  01/01/2021      NaN         NaN
2       3       3    Test2      NaN         NaN   Test 2  01/02/2021
Panwen Wang
  • 3,573
  • 1
  • 18
  • 39
  • Thanks! Apparently, the reason why my code was erring out is due to an incorrect header name i was using to join the columns. – Wall Elicot Sep 08 '21 at 12:55