-3

In preps for data analyst interview questions, I came across "find all duplicate emails (not unique emails) in "one-liner" using pandas."

The best I've got is not a single line but rather three:

# initialize dataframe 
import pandas as pd
d = {'email':['a','b','c','a','b']}
df= pd.DataFrame(d)

# select emails having duplicate entries
results = pd.DataFrame(df.value_counts())
results.columns = ['count']
results[results['count'] > 1]

>>>
    count
email   
b   2
a   2

Could the second block following the latter comment be condensed into a one-liner, avoiding the temporary variable results?

jbuddy_13
  • 902
  • 2
  • 12
  • 34

1 Answers1

1

Just use duplicated:

>>> df[df.duplicated()]
  email
3     a
4     b

Or if you want a list:

>>> df[df["email"].duplicated()]["email"].tolist()
['a', 'b']
not_speshal
  • 22,093
  • 2
  • 15
  • 30