I want drop records that have duplicates and their duplicates in pandas Dataframe based on a column
Asked
Active
Viewed 294 times
0
-
Please post sample data, expected output and what you have tried so far. – nitin3685 Nov 14 '19 at 08:24
-
Possible duplicate of [How to drop a list of rows from Pandas dataframe?](https://stackoverflow.com/questions/14661701/how-to-drop-a-list-of-rows-from-pandas-dataframe) – abhilb Nov 14 '19 at 08:31
-
@abhlib : user is asking how to remove duplicate column , not rows i believe and also how to drop with condition. I assume he is not aware which all columsn are duplicate. So he is not aware of the index of the columns to be dropped. – nitin3685 Nov 14 '19 at 08:37
-
@nitin3685 i think abhilb ia actually quite on track....however am not only looking to drop the duplicates, I want to drop both the duplicate and the first instance of the record.(Drop both the record and its duplicate)--drop both rows keep that in mind.... – herbert ichama Nov 14 '19 at 08:42
-
@herbert : use `drop_duplicated` with `keep=False` as mentioned my answer . If you want to drop duplicate rows, don't use `.T` – nitin3685 Nov 14 '19 at 09:31
-
@herbert : Please check the updated answer. It allows you to drop duplicate records based on a subset of columns and drops both the record and its duplicates – nitin3685 Nov 14 '19 at 10:08
1 Answers
1
df.drop_duplicates(subset='column_name',keep=False)
drop_duplicates
will drop duplicated
subset
will allow you the specify based on which column you want to determine duplicated
keep
will allow you to specify which record to keep or drop.
drop_duplicates : Please check this link for more info.

nitin3685
- 825
- 1
- 9
- 20