0

How to remove the rows that have the value of a column repeated more than 2 times. It could or not be consecutive. Like:

NAME      EMAIL
Joe       joe@email.com
John      joe@email.com
Eric      eric@mymail.com
Melissa   mel@email.com
Ron       joe@email.com

I would like to remove all rows with joe@email.com because it repeats more than 2 times.

sgobin
  • 13
  • 3
  • Does this [Python: Removing Rows on Count condition](https://stackoverflow.com/questions/49735683/python-removing-rows-on-count-condition) solve your problem? – tidakdiinginkan Apr 15 '20 at 18:15

1 Answers1

0

Create your dataframe

import pandas as pd
import numpy as np

data = {'Name': ['Michael', 'Larry', 'Shaq', 'barry'], 'email': ['asf@gmail.com', 'akfd@gmail.com', 'asf@gmail.com', 'asf@gmail.com'] }

df1 = pd.DataFrame.from_dict(data)

print(df1)

      Name           email
0  Michael   asf@gmail.com
1    Larry  akfd@gmail.com
2     Shaq   asf@gmail.com
3    barry   asf@gmail.com

Then filter it by values in a column that are greater than 2

fil =  df1.groupby('email').filter(lambda x : len(x)<2)

print(fil)

    Name           email
1  Larry  akfd@gmail.com
sanjayr
  • 1,679
  • 2
  • 20
  • 41