I am trying to find some email addresses in a source-code and match them with the first name and last name of the person they are associated with. The first step in my process is to find the first name and the last name of someone. I have a function that does that very well and return a list of full name.
The second step is to find the email address which is the closest to that name (whether it is displayed before the name or after). So I am looking for both: email before and email after.
For that particular purpose I wrote this regex expression:
for name in full_name_list:
# full name followed by the email
print(re.findall(name+'.*?([A-z0-9_.-]+?@[A-z0-9_.-]+?\.[A-z]+)', source))
# email followed by full name
print(re.findall('([A-z0-9_.-]+?@[A-z0-9_.-]+?\.\w+?.+?'+name+')', source))
Now here is the deal, assuming that my source code is like that and that my full_name_list=['John Doe', 'James Henry', 'Jane Doe']
:
" John Doe is part of our team and here is his email: johndoe@something.com. James Henry is also part of our team and here his email: jameshenry@something.com. Jane Doe is the team manager and you can contact her at that address: janedoe@something.com"
The first regex returns the name with the closest email after it, which is what I want. However the second regex always starts from the first email it founds and stops when it matches the name, which is odd since I asking to look for the least amount of character between the email and the name.... (or at least I think I am)
Is my assumption correct? If yes, what's happening? If not, how can I fix that?