-2

SO I'm trying to match an email of this form

a-b-c-@-d-e-.-e-f-g

I've come up with this regex

(\w+(?=-))*-@-(\w+(?=-))*.(\w+(?=-))

Why is that not working?

praks5432
  • 7,246
  • 32
  • 91
  • 156
  • Not working for which input? – thefourtheye Jan 17 '14 at 07:52
  • I'm not sure what this `(?=-)` syntax you're using is supposed to do. I would guess you simply want `-?`. – Robin Winslow Jan 17 '14 at 07:53
  • 2
    You may want something like this: `[\w-]*-@-[\w-]*.[\w-]*` – Robin Winslow Jan 17 '14 at 07:55
  • 1
    @RobinWinslow: That's a look-ahead assertion. It means that `\w+` should only match if followed by a `-`, but won't match the dash itself. – Martijn Pieters Jan 17 '14 at 07:56
  • What is it you're trying to achieve? Remove the hyphens? – SamWhan Jan 17 '14 at 07:56
  • I see now. So the initial `(\w+(?=-))*` will only match `c` because you haven't allowed space for the hyphen before `c` anywhere. I now think what you're looking for is even simpler than my initial suggestion: `(\w+-)*@(-\w+)*.\w+` – Robin Winslow Jan 17 '14 at 08:02
  • Seriously!? Guys, this question has been asked 3 times today alone. http://stackoverflow.com/questions/21171549/how-do-i-run-this-regex-in-php-that-parse-full-email-address-with-name/21171732#21171732 – brandonscript Jan 17 '14 at 08:03
  • @remus I think you're jumping to conclusions. This is a more specific request than just a simple email address. – Robin Winslow Jan 17 '14 at 08:05
  • 1
    @RobinWinslow I realize that, but I'm aiming to educate here - why spend the time coming up with complicated expressions when we can do away with that entirely? – brandonscript Jan 17 '14 at 08:06
  • possible duplicate of [Python check for valid email address?](http://stackoverflow.com/questions/8022530/python-check-for-valid-email-address) – Sorter Jan 17 '14 at 08:29

4 Answers4

3

You are over-complicating things with the look-ahead assertion. Any look-around assertion acts like an anchor (matches a location in the text, not the text itself); just like ^ and $ match the start and end of the matched text.

So, (\w+(?=-) matches just the a in the a- text. Right after the matched text is the next character, the -! So the pattern (\w+(?=-)* won't match a-b- because there's those dashes in there that are not part of the \w character class.

Use a combined character class instead that allows for both \w and - characters, [\w-] combines everything \w matches with an extra character, -:

[\w-]*-@-[\w-]*\.[\w-]*

You can test this yourself with this regex101 (which includes a full explanation of how it works).

would match your input. I've assumed you wanted to match the literal . character here, so I used \. instead of just ., which matches just about anything.

You you need to explicitly match only single word characters followed by a dash, repeated, then use:

(?:\w-)*@-(?:\w-)*\.(?:-\w)*

This pattern is different from your attempt, in that it removes the literal - before the @, and moves the - before the \w in the last group. See the regex101 for details on the pattern.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

If you're aiming to match email addresses in general give this a shot: https://github.com/madisonmay/CommonRegex

Usage is described like this:

>>> from commonregex import CommonRegex
>>> parsed_text = CommonRegex("There's an some@email.com in this sentence.")
>>> parsed_text.emails
["some@mail.com"]
mamachanko
  • 886
  • 1
  • 7
  • 15
0

You can use :

(\w|[-])*-@-(\w|[-])*.(\w|[-])*

Problem with your code is :

(?=-) is Positive Lookahead which Assert that the character - literally can be matched. And it will forget the current match still .

Refer THIS .

Sujith PS
  • 4,776
  • 3
  • 34
  • 61
0

Assuming that what your asking for adheres to these rules:

  • There's must be only one . and one @
  • There must be a - directory either side of the @ and of the .
  • The whole string must start and end with a letter
  • -s must only separate words, never be next to each other

Then I think this will do the trick:

^(\w+-)*\w+-@-(\w+-)*.(-\w+)*$

http://regexr.com?381h6

Robin Winslow
  • 10,908
  • 8
  • 62
  • 91