In preps for data analyst interview questions, I came across "find all duplicate emails (not unique emails) in "one-liner" using pandas."
The best I've got is not a single line but rather three:
# initialize dataframe
import pandas as pd
d = {'email':['a','b','c','a','b']}
df= pd.DataFrame(d)
# select emails having duplicate entries
results = pd.DataFrame(df.value_counts())
results.columns = ['count']
results[results['count'] > 1]
>>>
count
email
b 2
a 2
Could the second block following the latter comment be condensed into a one-liner, avoiding the temporary variable results
?