Pandas - Join on multiple unique columns

Question

I have the following two dataframes:

   x_1  x_2      x_3    x_4        x_5  ID  x_count
0  159  xyz  883nne2  28h93             10        1
1  159  xyz  883nne2  28h93     lightz  10        2
2  159  xyz  94mmm4k             heavy  15        2
3  354  abc  94mmm4k                     3        1
4  354  abc  94mmm4k   455h             15        1
5  354  abc  94mmm4k             super   3        1
6  354  abc  94mmm4k    53g             10        1

And:

   y_1  y_2      y_3    y_4     y_5  ID  y_count 
0  159  xyz  883nne2  28h93          10       55
1  159  xyz  883nne2  28h93  lightz  10       14
2  159  xyz  94mmm4k          heavy  15       2
3  354  abc  94mmm4k                  3       7

I am looking to join both tables and do math on the count columns, I am not sure what would be the best method to join these two tables. If you notice the x_1, x_2, x_3, x_4, x_5 columns have unique values that are being accounted for in the count column and each of their instances, the table with the y columns matches on some of the same unique instances found in the table with the x columns

Here is what I was hoping the new dataframe would look like:

   x_1  x_2      x_3    x_4        x_5  ID  x_count y_1  y_2     y_3    y_4   y_5   ID  y_count
0  159  xyz  883nne2  28h93             10        1 159  xyz  883nne2 28h93         10      55
1  159  xyz  883nne2  28h93     lightz  10        2 159  xyz  883nne2 28h93  lightz 10      14
2  159  xyz  94mmm4k             heavy  15        2 159  xyz  94mmm4k         heavy 15       2
3  354  abc  94mmm4k                     3        1 354  abc  94mmm4k                3       7
4  354  abc  94mmm4k   455h             15        1
5  354  abc  94mmm4k             super   3        1
6  354  abc  94mmm4k    53g             10        1

I've tried merging on the ID column and left join but it just duplicated these records and did give me the result shown above.

Any guidance is greatly appreciated.

score 2 · Accepted Answer · edited Mar 11 '20 at 06:34

2

Let's suppose dataframes are df1 and df2 respectively. You can do a left merge using all the columns like this

df1_merge_cols = ['x_1','x_2','x_3','x_4','x_5','ID']
df2_merge_cols = ['y_1','y_2','y_3','y_4','y_5','ID']
df1 = df1.merge(df2, left_on=df1_merge_cols, right_on=df2_merge_cols, how='left')

edited Mar 11 '20 at 06:34

Vishnudev Krishnadas

10,679
2
23
55

answered Mar 11 '20 at 06:25

Mahendra Singh

508
2
9

Thank you for the help! this is the correct answer! – ApacheOne Mar 11 '20 at 16:46

Pandas - Join on multiple unique columns

1 Answers1