0

My code is removing all duplicates using the drop_duplicates, keep=false.

The issue I'm having is that before I remove the duplicates I want to move all removed duplicates to a separate dataframe. I've come up with the below line of code, however I think its leaving one duplicate remaining and not removing ALL duplicates.

duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count'), :]

combined_df.drop_duplicates(subset='Unique_ID_Count', inplace=True, keep=False)

Do you have any ideas on how I can move all duplicates dropped in the second line of code to the duplicates_df dataframe?

Any help would be much appreciated, thanks!

mot375
  • 99
  • 1
  • 13
  • 1
    you used `keep=False` parameter in one and didn't use it in the other. Use it in both –  Mar 11 '22 at 18:08

1 Answers1

1

Try this:

duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count', keep=False)]
combined_df   = combined_df.loc[~combined_df.duplicated(subset='Unique_ID_Count', keep=False)]
  • Amazing, thanks a lot - this was a lot easier than I thought. I didn't think I could apply 'keep' to duplicated(). I have a couple of questions if you don't mind. 1. Are there any perks of using duplicated() instead of drop duplicates? 2. What is the ~ doing before the combined? – mot375 Mar 11 '22 at 18:12
  • 1
    1. No, there aren't any perks with `duplicated()`. It just returns true for rows that are already present. 2. `~` _inverts_ the specified column. So if you have a column like [True, False, False], if you put a `~` before it, you'd get `[False, True, True]`. It's _thoroughly_ described [here](https://stackoverflow.com/a/54358361). –  Mar 11 '22 at 18:24