your regex e-mail is not working at all: emails = re.findall(r'.[@]', String)
matches anychar then @
.
I would try a different approach: match the sentences and extract name,e-mails couples with the following empiric assumptions (if your text changes too much, that would break the logic)
- all names are followed by
's"
and is
somewhere (using non-greedy .*?
to match all that is in between
\w
matches any alphanum char (or underscore), and only one dot for domain (else it matches the final dot of the sentence)
code:
import re
String = "'Jessica's email is jessica@gmail.com, and Daniel's email is daniel123@gmail.com. Edward's is edwardfountain@gmail.com, and his grandfather, Oscar's, is odawg@gmail.com.'"
print(re.findall("(\w+)'s.*? is (\w+@\w+\.\w+)",String))
result:
[('Jessica', 'jessica@gmail.com'), ('Daniel', 'daniel123@gmail.com'), ('Edward', 'edwardfountain@gmail.com'), ('Oscar', 'odawg@gmail.com')]
converting to dict
would even give you a dictionary name => address:
{'Oscar': 'odawg@gmail.com', 'Jessica': 'jessica@gmail.com', 'Daniel': 'daniel123@gmail.com', 'Edward': 'edwardfountain@gmail.com'}
The general case needs more chars (not sure I'm exhaustive):
String = "'Jessica's email is jessica_123@gmail.com, and Daniel's email is daniel-123@gmail.com. Edward's is edward.fountain@gmail.com, and his grandfather, Oscar's, is odawg@gmail.com.'"
print(re.findall("(\w+)'s.*? is ([\w\-.]+@[\w\-.]+\.[\w\-]+)",String))
result:
[('Jessica', 'jessica_123@gmail.com'), ('Daniel', 'daniel-123@gmail.com'), ('Edward', 'edward.fountain@gmail.com'), ('Oscar', 'odawg@gmail.com')]