How to find duplicates in pandas?

Question

I've a data frame of about 52000 rows with some duplicates, when I use

df_drop_duplicates()

I loose about 1000 rows, but I don't want to erase this rows I want to know which ones are the duplicates rows

Does this answer your question? [How do I get a list of all the duplicate items using pandas in python?](https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python) — Abu Shoeb, Apr 27 '21 at 17:08

score 10 · Accepted Answer · edited Jun 20 '20 at 09:12

10

You could use duplicated for that:

df[df.duplicated()]

You could specify keep argument for what you want, from docs:

keep : {‘first’, ‘last’, False}, default ‘first’

first : Mark duplicates as True except for the first occurrence.

last : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 15 '16 at 11:46

Anton Protopopov

30,354
12
88
93

score 0 · Answer 2 · answered Apr 26 '23 at 14:54

To identify duplicates within a pandas column without dropping the duplicates, try:

Let 'Column_A' = column with duplicate entries 'Column_B' = a true/false column that marks duplicates in Column A.

df['Column_B'] = df.duplicated(subset='Column_A', keep='first')

Change the parameters to fine tune to your needs.

How to find duplicates in pandas?

2 Answers2