How to move ALL duplicated rows into separate dataframe

Question

My code is removing all duplicates using the drop_duplicates, keep=false.

The issue I'm having is that before I remove the duplicates I want to move all removed duplicates to a separate dataframe. I've come up with the below line of code, however I think its leaving one duplicate remaining and not removing ALL duplicates.

duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count'), :]

combined_df.drop_duplicates(subset='Unique_ID_Count', inplace=True, keep=False)

Do you have any ideas on how I can move all duplicates dropped in the second line of code to the duplicates_df dataframe?

Any help would be much appreciated, thanks!

you used `keep=False` parameter in one and didn't use it in the other. Use it in both — , Mar 11 '22 at 18:08

score 1 · Accepted Answer · answered Mar 11 '22 at 18:04

1

Try this:

duplicates_df = combined_df.loc[combined_df.duplicated(subset='Unique_ID_Count', keep=False)]
combined_df   = combined_df.loc[~combined_df.duplicated(subset='Unique_ID_Count', keep=False)]

answered Mar 11 '22 at 18:04

Amazing, thanks a lot - this was a lot easier than I thought. I didn't think I could apply 'keep' to duplicated(). I have a couple of questions if you don't mind. 1. Are there any perks of using duplicated() instead of drop duplicates? 2. What is the ~ doing before the combined? – mot375 Mar 11 '22 at 18:12
1

1. No, there aren't any perks with `duplicated()`. It just returns true for rows that are already present. 2. `~` _inverts_ the specified column. So if you have a column like [True, False, False], if you put a `~` before it, you'd get `[False, True, True]`. It's _thoroughly_ described [here](https://stackoverflow.com/a/54358361). – Mar 11 '22 at 18:24

How to move ALL duplicated rows into separate dataframe

1 Answers1