I'm looking to find words in a string that match a specific pattern. Problem is, if the words are part of an email address, they should be ignored.
To simplify, the pattern of the "proper words" \w+\.\w+
- one or more characters, an actual period, and another series of characters.
The sentence that causes problem, for example, is a.a b.b:c.c d.d@e.e.e
.
The goal is to match only [a.a, b.b, c.c]
. With most Regexes I build, e.e
returns as well (because I use some word boundary match).
For example:
>>> re.findall(r"(?:^|\s|\W)(?<!@)(\w+\.\w+)(?!@)\b", "a.a b.b:c.c d.d@e.e.e")
['a.a', 'b.b', 'c.c', 'e.e']
How can I match only among words that do not contain "@"?