I have strings in the following format name1 <email1@email.com>
. How can I use regex to pull only the name1 part out?
Also, how might I be able to do this if I had multiple such names and emails, say name1 <email1@email.com>, name2 <email2@email.com>
?
Asked
Active
Viewed 72 times
1

supersaiyajin87
- 151
- 1
- 8
-
1Are the emails actually surrounded by `< >`? – DeepSpace Feb 25 '21 at 18:16
-
@DeepSpace yes, they are. – supersaiyajin87 Feb 25 '21 at 18:21
3 Answers
3
Try using split
:
In [164]: s = 'name1 <email1@email.com>, name2 <email2@email.com>'
In [166]: [i.split()[0] for i in s.split(',')]
Out[166]: ['name1', 'name2']
If you have just one name:
In [161]: s = 'name1 <email1@email.com>'
In [163]: s.split()[0]
Out[163]: 'name1'

Mayank Porwal
- 33,470
- 8
- 37
- 58
-
I thought to do this too, but it only works if there are not additional spaces, which I doubt is guaranteed; indeed a regex about `<` would be much cleaner! – ti7 Feb 25 '21 at 18:17
-
I don't think the emails are surrounded by `<>`. Its just a way of representing. – Mayank Porwal Feb 25 '21 at 18:18
-
1@ti7 `str.split` will behave the same if there are multiple spaces. `'name1
'.split()` returns the same output as `'name1 – C.Nivs Feb 25 '21 at 18:20'.split()` -
-
obviously, but it will not work if the structure is like `first, last
` (which "real" emails frequently are) - the OP did not state this, but it'll be the case for any real-world collection – ti7 Feb 25 '21 at 18:21 -
The emails are surrounded by `<>`, which is why I quoted the whole thing. – supersaiyajin87 Feb 25 '21 at 18:22
-
2
You can start with (\w+)\s<.*?>(?:,\s)?
(see on regex101.com), which relies on the fact that emails are surrounded by < >
, and customize it as you see fit.
Note that this regex does not specifically look for emails, just for text surrounded by < >
.
Don't fall down the rabbit hole of trying to specifically match emails.
import re
regex = re.compile(r'(\w+)\s<.*?>(?:,\s)?')
string = 'name1 <email1@email.com>, name2 <email2@email.com>'
print([match for match in regex.findall(string)])
outputs
['name1', 'name2']

DeepSpace
- 78,697
- 11
- 109
- 154
2
import re
name = re.search(r'(?<! <)\w+', 'name1 <email@email.com>')
print(name.group(0))
>>> name1
Explanation:
(?<!...) is called a negative lookbehind assertion. I added ' <' into the ... as you are looking for the string that precedes the '<' of the email.
re.search(r'(?<!...), string_to_search)
https://docs.python.org/3/library/re.html
Edit/Forgot:
To search strings with multiple:
import re
regex = r"\w+([?<! <])"
multi_name = "name1 <email@email.com>, name2 <email@email.com>"
matches = re.finditer(regex, multi_name, re.MULTILINE)
for group, match in enumerate(matches, start=1):
print(f"Match: {match.group()}")
>>> name1
>>> name2

nahar
- 41
- 5