Read this older post updated recently How to merge dataframes based on an "OR" condition. In that example there were 2 possible columns to merge on, in this case there are three A, B and C, with C being a string.
The two df could match on any combination of A, B OR C. And each produces unique merge records on their own. Using the above example as a guide without luck.
Appreciate any help!
DF2
A | B | C | D |
---|---|---|---|
123 | 321 | abc | company1 |
456 | rss | company2 | |
321 | 432 | xyz | company3 |
klm | company4 |
DF1
A | B | C | F | G |
---|---|---|---|---|
123 | abc | active | 650 | |
456 | active | 754 | ||
321 | 432 | active | 820 | |
klm | active | 840 |
Working Code
suff_A = ['_on_A_match_1', '_on_A_match_2', '_on_A_Match_3']
suff_B = ['_on_B_match_1', '_on_B_match_2', '_on_B_match_3']
suff_C = ['_on_C_match_1', '_on_C_match_2', '_on_C_match_3']
Get an error message on this piece too many values to unpack
DF3 = pd.concat([DF1.merge(DF2, on='A', suffixes=suff_A),
DF1.merge(DF2, on='B', suffixes=suff_B),
DF1.merge(DF2, on='C', suffixes=suff_C)])
duplicates = (DF3.A_on_B_match_1 == DF3.A_on_B_match_2 == DF3.A_on_C_match_3,
DF3.B_on_A_match_1 == DF3.B_on_A_match_2 == DF3.B_on_C_match_3,
kbd>DF3.C_on_A_match_1 == DF3.C_on_B_match_2)
DF3.loc[~duplicates]
Expected Outcome
A | B | C | D | F | G |
---|---|---|---|---|---|
123 | 321 | abc | company1 | active | 650 |
456 | rss | company2 | active | 754 | |
321 | 432 | xyz | company3 | active | 820 |
klm | company4 | active | 840 |