4

How can I merge/join these two dataframes ONLY on "id". Produce 3 new dataframes:

  • 1)R1 = Merged records
  • 2)R2 = (DF1 - Merged records)
  • 3)R3 = (DF2 - Merged records)

Using pandas in Python.

First dataframe (DF1)

|        id | name  |
|-----------|-------|
| 1         | Mark  |
| 2         | Dart  |
| 3         | Julia |
| 4         | Oolia |
| 5         | Talia |

Second dataframe (DF2)

|        id | salary |
|-----------|--------|
| 1         | 20     |
| 2         | 30     |
| 3         | 40     |
| 4         | 50     |
| 6         | 33     |
| 7         | 23     |
| 8         | 24     |
| 9         | 28     |

My solution for

R1 =pd.merge(DF1, DF2, on='id', how='inner')

I am unsure that is the easiest way to get R2 and R3

R2 should look like

|        id | name  |
|-----------|-------|
| 5         | Talia |

R3 should look like:

|        id | salary |
|-----------|--------|
| 6         | 33     |
| 7         | 23     |
| 8         | 24     |
| 9         | 28     |
misguided
  • 3,699
  • 21
  • 54
  • 96

1 Answers1

11

You can turn on indicator in merge and look for the corresponding values:

total_merge = df1.merge(df2, on='id', how='outer', indicator=True)

R1 = total_merge[total_merge['_merge']=='both']
R2 = total_merge[total_merge['_merge']=='left_only']
R3 = total_merge[total_merge['_merge']=='right_only']

Update: Ben's suggestion would be something like this:

dfs = {k:v for k,v in total_merge.groupby('_merge')}

and then you can do, for examples:

dfs['both']
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74