Pandas - How to remove duplicates from a subset based on another subset

Question

I have 2 subsets that have similar columns, but the one column they have in common is column A.

I have the left df L and the right df R.

I want to make sure that any duplicates for column A seen in L that are also in df R are removed from L - the whole column.

How would one do this?

import pandas as pd
L_df = pd.DataFrame({'A': ['bob/is/cool', 'alice/is/cool', 'jim/is/cool'], 
                   'view': ['A', 'B', 'B']})
R_df = pd.DataFrame({'A': ['ralf/is/cool', 'i/am/cool', 'alice/is/cool'], 
                   'view': ['A', 'B', 'C']})

I want to get the result of this with the result taking away duplicates for column A, and taking the duplicated value from L not R.

So we take alice/is/cool with a view value of C and not B if that makes sense :)

Output would be

out = pd.DataFrame({'A': ['ralf/is/cool', 'i/am/cool', 'alice/is/cool', 'bob/is/cool', 'jim/is/cool'], 
                   'view': ['A', 'B', 'C', 'A', 'B']})

Will you please provide samples of your dataframes and your expected output? :) — , Dec 21 '21 at 18:18
Please provide a [mcve](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) <- Refer the link — anky, Dec 21 '21 at 18:19
Will you please also add a df that you want to get as output? — , Dec 21 '21 at 18:53
why is `'bob/is/cool'` having a view of `'B'` in your desired output? — Pierre D, Dec 21 '21 at 19:03
cool. If the answer provided below doesn't fit the need, can you also describe corner cases (e.g., multiple duplicates in either or both dataframes) and how you'd like them treated? — Pierre D, Dec 21 '21 at 19:15
I think it should fit what I want, I'll give it a try. Thanks Pierre :) — caasswa, Dec 21 '21 at 19:16

Pierre D · Accepted Answer · 2021-12-21T19:04:40.360

Would this be what you are after?

>>> pd.concat([R_df, L_df]).drop_duplicates(keep='first', subset='A')
               A view
0   ralf/is/cool    A
1      i/am/cool    B
2  alice/is/cool    C
0    bob/is/cool    A
2    jim/is/cool    B

Note: this is a wild guess based on your description.

It will indiscriminately remove any duplicates (within R, within L, or in the concatenation of both) and keep just the first one.

You may want a more subtle disposition of cases depending on where and how many duplicates you have, but it's hard to tell without a more robust set of examples.

Pandas - How to remove duplicates from a subset based on another subset

1 Answers1