0

I'm trying to build a regular expression to meet these conditions:

[DON'T MATCH]

dont:match@example.com

[MATCH]

mailto:match@example.com
match@example.com
<p>match@example.com</p>

I can match the last two, but the first example (DON'T MATCH) is also matched.

How do I make sure an email is only valid if it's plain or proceeded by mailto:, but not just a :?

http://rubular.com/r/HvldBe4Ew9

Regex:

(?<=mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
okoboko
  • 4,332
  • 8
  • 40
  • 67

2 Answers2

1

You can use anchors ^ and $ for matching string start/end if the strings are passed as separate values:

(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)

Or, getting rid of capturing groups:

(?<=>)(?:mailto:)?[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?=<)

See demo

Please note that you have an issue in [a-zA-Z0-9-.]: the hyphen symbol should not appear unescaped in the middle of the character class.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I also need to match something like `

    match@example.com

    ` (or any other HTML tag around it), so I can't use `^` in the front.
    – okoboko May 03 '15 at 22:07
  • Good, here is a solution: add the look-arounds with `>` and `<`. – Wiktor Stribiżew May 03 '15 at 22:08
  • How do I adjust the regex to not match include the `mailto:` part in the match but only match an email address if it's proceeded by `mailto:`? – okoboko May 03 '15 at 22:28
  • Have a look here: https://regex101.com/r/jH6lD9/1. Using `(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)`, you can get the email captured in Group 1. In Python, you can refer to that capture group using `.group(1)`. See http://ideone.com/1ZfVIO for sample Python2.7 demo. Let me know if it works OK. – Wiktor Stribiżew May 03 '15 at 22:32
0

No need fora-zA-Z, just use A-Z and make the regex case insensitive with re.IGNORECASE.
Also make sure you use

^ Assert position at the beginning of a line
and
$ Assert position at the end of a line


Python Example:

import re

match = re.search(r"^(?:mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[\tA-Z0-9-.]+)$", email, re.IGNORECASE)
if match:
    result = match.group(1)
else:
    result = ""

Demo:

https://regex101.com/r/cI1eD6/1


Regex explanation:

^(mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)$

Options: Case insensitive

Assert position at the beginning of a line «^»
Match the regex below and capture its match into backreference number 1 «(mailto:)?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match the character string “mailto:” literally «mailto:»
Match the regex below and capture its match into backreference number 2 «([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)»
   Match a single character present in the list below «[A-Z0-9_.+-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “_.+” «_.+»
      The literal character “-” «-»
   Match the character “@” literally «@»
   Match a single character present in the list below «[A-Z0-9-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      The literal character “-” «-»
   Match the character “.” literally «\.»
   Match a single character present in the list below «[A-Z0-9-.]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “-.” «-.»
Assert position at the end of a line «$»
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268