One-liner to identify duplicates using pandas?

Question

In preps for data analyst interview questions, I came across "find all duplicate emails (not unique emails) in "one-liner" using pandas."

The best I've got is not a single line but rather three:

# initialize dataframe 
import pandas as pd
d = {'email':['a','b','c','a','b']}
df= pd.DataFrame(d)

# select emails having duplicate entries
results = pd.DataFrame(df.value_counts())
results.columns = ['count']
results[results['count'] > 1]

>>>
    count
email   
b   2
a   2

Could the second block following the latter comment be condensed into a one-liner, avoiding the temporary variable results?

score 1 · Accepted Answer · answered Aug 12 '21 at 19:19

1

Just use duplicated:

>>> df[df.duplicated()]
  email
3     a
4     b

Or if you want a list:

>>> df[df["email"].duplicated()]["email"].tolist()
['a', 'b']

answered Aug 12 '21 at 19:19

not_speshal

22,093
2
15
30

One-liner to identify duplicates using pandas?

1 Answers1