I have a large dataframe, and I want to basically create a "unique identifier" for every separate person. The relevant column is the "e-mail" column, but it's made difficult by the formatting: each person can have multiple e-mails. Example frame below:
Name of person ||| E-mail Address
'John Doe' ||| 'john.c.doe@choo.com'
'Bob Jones' ||| 'bobbyj@aboy.net;bob.jones@omic.com'
'Robert Jones' ||| 'robert@mail.com;bobbyj@aboy.net'
'Clara Bit' ||| 'clara@mail.com'
'John Doe' ||| 'j.diddy@ack.org;jjd@ila.hun'
I want to have a field to tell people apart as individuals based on the e-mails:
Name of person ||| person ID
'John Doe' 1
'Bob Jones' 2
'Robert Jones' 2
'Clara Bit' 3
'John Doe' 4
My brain is kind of blowing up figuring out how to do it using for loops, so I'm hoping there's an easier way (plus, I'm iterating over df.index a lot, which I'm told is bad form and is incredibly slow regardless). Is there a function that could do something if I made multiple e-mail columns with single e-mail elements?
Thank you!
EDIT: Apologies for the typo on the third line of e-mails, it has been fixed.