I would like to find rows in following table, which contain repeated email addresses. I create an extra column in the dataframe in the following code with value 'ja', when an email address is repeated. This is fine for a small number of rows (150). For large number of rows (30000), the script hangs. Any better ways to loop over the rows?
import pandas as pd
data={'Name':['Danny','Damny','Monny','Quony','Dimny','Danny'],
'Email':['danny@gmail.com','danny@gmail.com','monny@gmail.com','quony@gmail.com','danny@gmail.com','danny@gmail.com']}
df=pd.DataFrame(data)
df['email_repeated']=None
col_email=df.columns.get_loc("Email")
row_count=len(df.index)
for i in range(0,row_count):
for k in range(0,row_count):
emailadres=df.iloc[i,col_email]
if k!=i:
if emailadres==df.iloc[k,col_email]:
df['email_repeated'][k] = 'ja'