I have the following Dataframes
df1=
col1 col2 col3 value
A B AB 5
B C AB 4
C D AB 3
D E AB 12
df2=
col1 col2 col3 value
A B AB 3
C D AB 1
Z E BC 2
I need a new df3 which has rows which are present in df1, but not in df2. Similarily I also need a df4, which has rows which are in df2, but not in df1.
df3= //Rows which in Df1 but not in df2
col1 col2 col3 value
B C AB 5
D E AB 12
df4= //Rows which are in df2 but not in df1
col1 col2 col3 value
Z E BC 2
Please note that we can ignore the column 'value' , but col1 col2 col3 should match to be considered a match.
If I do a merge with indicator=true
mdf1=pd.merge(df1, df2, how='outer', indicator=True)
mdf1=
col1 col2 col3 value _merge
0 A B AB 5 left_only
1 B C AB 4 left_only
2 C D AB 3 left_only
3 D E AB 12 left_only
4 A B AB 3 right_only
5 C D AB 1 right_only
6 Z E BC 2 right_only
You see that in mdf1 row number 0 and 4, 2 and 5 are same but got mismatched because the value isnt same. I want to ignore the value column when merging but I want the column in result Data frame.
Here is what i need the mdf to be.
col1 col2 col3 value _merge
0 B C AB 4 left_only
1 D E AB 12 left_only
2 Z E BC 2 right_only