How to get rows which are present in df1 but not in df2 and vice versa while ignoring the column 'value'

Question

I have the following Dataframes

df1=
   col1 col2 col3 value
   A     B    AB   5
   B     C    AB   4
   C     D    AB   3
   D     E    AB   12

df2=
   col1 col2 col3 value
   A     B    AB   3
   C     D    AB   1
   Z     E    BC   2

I need a new df3 which has rows which are present in df1, but not in df2. Similarily I also need a df4, which has rows which are in df2, but not in df1.

df3= //Rows which in Df1 but not in df2

   col1 col2 col3 value
    B     C    AB   5
    D     E    AB   12

df4= //Rows which are in df2 but not in df1

   col1 col2 col3 value
    Z     E    BC   2

Please note that we can ignore the column 'value' , but col1 col2 col3 should match to be considered a match.

If I do a merge with indicator=true

mdf1=pd.merge(df1, df2, how='outer', indicator=True)
mdf1=
    col1 col2 col3  value      _merge
0    A    B   AB      5   left_only
1    B    C   AB      4   left_only
2    C    D   AB      3   left_only
3    D    E   AB     12   left_only

4    A    B   AB      3  right_only
5    C    D   AB      1  right_only
6    Z    E   BC      2  right_only

You see that in mdf1 row number 0 and 4, 2 and 5 are same but got mismatched because the value isnt same. I want to ignore the value column when merging but I want the column in result Data frame.

Here is what i need the mdf to be.

   col1 col2 col3  value      _merge

0    B    C   AB      4   left_only

1    D    E   AB     12   left_only

2    Z    E   BC      2  right_only

The dupe only gets you if you want to merger on all columns, I cannot merger all columns since the column 'value' is dynamic. I can only merge on 'col1' col2' col3' but I want the 'value' column as well in result Dataframe. — Bounty Collector, Apr 21 '20 at 13:20
I think best is add your code, what you try and what not working. — jezrael, Apr 21 '20 at 13:21
Maybe best is add your code, what you try and why not working (in another words explain why this is not dupe) — jezrael, Apr 21 '20 at 13:24
i am sorta confused, do i need a new question or explain here in comments? thx — Bounty Collector, Apr 21 '20 at 13:26
sorry, not explained. I think not question is not necessary, only edit question. — jezrael, Apr 21 '20 at 13:27

score 1 · Accepted Answer · answered Apr 21 '20 at 13:55

1

In your solution is necessary remove value column:

mdf1=pd.merge(df1.drop('value', 1), df2, how='outer', indicator=True)
mdf1 = mdf1[mdf1['_merge'] == 'right_only']
print (mdf1)
  col1 col2 col3  value      _merge
4    Z    E   BC    2.0  right_only

mdf2=pd.merge(df1, df2.drop('value', 1), how='outer', indicator=True)
mdf2 = mdf2[mdf2['_merge'] == 'left_only']
print (mdf2)
  col1 col2 col3  value     _merge
1    B    C   AB    4.0  left_only
3    D    E   AB   12.0  left_only

answered Apr 21 '20 at 13:55

jezrael

822,522
95
1,334
1,252

1

This is what i need, i am curious is to why index or mdf1/mdf2 is 4and 1,3 respectivly? should it not be starting from 0? if I were to dump it in csv, I would have to reset index? – Bounty Collector Apr 21 '20 at 14:05
@BountyCollector - Because there are originaly more rows in `mdf1`, `mdf2`. You can reset_index for default index like `mdf1 = mdf1[mdf1['_merge'] == 'right_only'].reset_index(drop=True)` – jezrael Apr 21 '20 at 14:07

How to get rows which are present in df1 but not in df2 and vice versa while ignoring the column 'value'

1 Answers1