Email Regular Expression Explanation

Question

I'm trying to understand a regular expression which is currently being used to validate the input of an email address on a website. The value of this email address is used to populate a target system; validation of which can be expressed in plain English.

I would like to be able to highlight, with the use of examples, where the website validated email address imposes validation rules that are not required in the target system. To this end, I have obtained the regular expression from the developer, and am requiring some assistance in translating it to allow it to be understood in plain English:

^[_A-Za-z0-9_%+-]+(\\.[_A-Za-z0-9_%+-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,4})$

So far, I have gained some understanding from a previous post.

... which would seem to confirm the following:

^ = The matched string must begin here, and only begin here

[ ] = match any character inside the brackets, but only match one.
I'm not sure of the relevance of "only match one". Can anyone advise?

\+ = match previous expression at least once, unlimited number of times.
Presumably this means the previous expression refers to the characters contained within the preceding square brackets and it can be repeated unlimited times?

() = make everything inside the parentheses a group (and make them referencable).
I'm not sure what this might mean.

\\. = match a literal full stop (.)

Then we have a repeat of the square bracket content, though I'm unsure what the relevance is here since the initial square brackets character class can be repeated unlimited times? @ = match a literal @ symbol

The final parenthesis seems to match the top level domain which must be at least 2 characters but no more than 4 characters.

I think my main issue is in understanding the relevance of the round brackets as I can't understand what they add beyond what the square brackets add.

Any help would be much appreciated.

There is a web site which explains regexes: [your regex explained](http://rick.measham.id.au/paste/explain.pl?regex=^[A-Za-z0-9%25%2B-]%2B%28\.[A-Za-z0-9%25%2B-]%2B%29%40[A-Za-z0-9]%2B%28\.[A-Za-z0-9]%2B%29%28\.[A-Za-z]{2%2C4}%29%24). Note that someone has decided that TLDs can be more than four characters now. — Andrew Morton, Mar 27 '14 at 21:36
**much** more than four characters. The `4` in `{2,4}` should be omitted here to allow for at least two letters e.g. `{2,}` or define a higher limit such as 10 or 15 — scrowler, Mar 27 '14 at 21:36

score 0 · Answer 1 · answered Mar 27 '14 at 21:36

 ^[_A-Za-z0-9_%+-]+(\\.[_A-Za-z0-9_%+-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,4})$

 Assert position at the beginning of the string «^»
 Match a single character present in the list below «[_A-Za-z0-9_%+-]+»
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    The character “_” «_»
    A character in the range between “A” and “Z” «A-Z»
    A character in the range between “a” and “z” «a-z»
    A character in the range between “0” and “9” «0-9»
    One of the characters “_%” «_%»
    The character “+” «+»
    The character “-” «-»
 Match the regular expression below and capture its match into backreference number 1 «(\\.[_A-Za-z0-9_%+-]+)*»
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
    Match the character “\” literally «\\»
    Match any single character that is not a line break character «.»
    Match a single character present in the list below «[_A-Za-z0-9_%+-]+»
       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
       The character “_” «_»
       A character in the range between “A” and “Z” «A-Z»
       A character in the range between “a” and “z” «a-z»
       A character in the range between “0” and “9” «0-9»
       One of the characters “_%” «_%»
       The character “+” «+»
       The character “-” «-»
 Match the character “@” literally «@»
 Match a single character present in the list below «[A-Za-z0-9]+»
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
    A character in the range between “A” and “Z” «A-Z»
    A character in the range between “a” and “z” «a-z»
    A character in the range between “0” and “9” «0-9»
 Match the regular expression below and capture its match into backreference number 2 «(\\.[A-Za-z0-9]+)*»
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
    Match the character “\” literally «\\»
    Match any single character that is not a line break character «.»
    Match a single character present in the list below «[A-Za-z0-9]+»
       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
       A character in the range between “A” and “Z” «A-Z»
       A character in the range between “a” and “z” «a-z»
       A character in the range between “0” and “9” «0-9»
 Match the regular expression below and capture its match into backreference number 3 «(\\.[A-Za-z]{2,4})»
    Match the character “\” literally «\\»
    Match any single character that is not a line break character «.»
    Match a single character present in the list below «[A-Za-z]{2,4}»
       Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}»
       A character in the range between “A” and “Z” «A-Z»
       A character in the range between “a” and “z” «a-z»
 Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Many thanks Andrew, scrowler, Unamata for your explanations which has helped me understand it almost completely. However, I'm still confused on the meaning of the round brackets i.e. (\\.[_A-Za-z0-9_%+-]+) Do the brackets indicate an optionality of this pattern matching? I notice for example that I can enter an email address in this field of ab.cd.ef.gh@ij.kl.mn.op.com but do not know which characters in the regex allow the repetition here i.e. ef.gh? Also, does the * indicate that the following characters or pattern are mandatory? — user2751378, Mar 31 '14 at 08:53
The round brackets put the result into a backreference. You can access that reference by typing `$2`, or going `$results[2]` in `PHP` — Unamata Sanatarai, Mar 31 '14 at 11:04
Hi can you explain in plain language what "put the result into a backreference means". I should explain I am not a developer, and instead approaching the problem from a business analyst point of view. Ideally I'd like to be able to explain the regex to business users. — user2751378, Mar 31 '14 at 12:52

Email Regular Expression Explanation

1 Answers1