SO I'm trying to match an email of this form
a-b-c-@-d-e-.-e-f-g
I've come up with this regex
(\w+(?=-))*-@-(\w+(?=-))*.(\w+(?=-))
Why is that not working?
SO I'm trying to match an email of this form
a-b-c-@-d-e-.-e-f-g
I've come up with this regex
(\w+(?=-))*-@-(\w+(?=-))*.(\w+(?=-))
Why is that not working?
You are over-complicating things with the look-ahead assertion. Any look-around assertion acts like an anchor (matches a location in the text, not the text itself); just like ^
and $
match the start and end of the matched text.
So, (\w+(?=-)
matches just the a
in the a-
text. Right after the matched text is the next character, the -
! So the pattern (\w+(?=-)*
won't match a-b-
because there's those dashes in there that are not part of the \w
character class.
Use a combined character class instead that allows for both \w
and -
characters, [\w-]
combines everything \w
matches with an extra character, -
:
[\w-]*-@-[\w-]*\.[\w-]*
You can test this yourself with this regex101 (which includes a full explanation of how it works).
would match your input. I've assumed you wanted to match the literal .
character here, so I used \.
instead of just .
, which matches just about anything.
You you need to explicitly match only single word characters followed by a dash, repeated, then use:
(?:\w-)*@-(?:\w-)*\.(?:-\w)*
This pattern is different from your attempt, in that it removes the literal -
before the @
, and moves the -
before the \w
in the last group. See the regex101 for details on the pattern.
If you're aiming to match email addresses in general give this a shot: https://github.com/madisonmay/CommonRegex
Usage is described like this:
>>> from commonregex import CommonRegex
>>> parsed_text = CommonRegex("There's an some@email.com in this sentence.")
>>> parsed_text.emails
["some@mail.com"]
Assuming that what your asking for adheres to these rules:
.
and one @
-
directory either side of the @
and of the .
-
s must only separate words, never be next to each otherThen I think this will do the trick:
^(\w+-)*\w+-@-(\w+-)*.(-\w+)*$