I'm trying to understand a regular expression which is currently being used to validate the input of an email address on a website. The value of this email address is used to populate a target system; validation of which can be expressed in plain English.
I would like to be able to highlight, with the use of examples, where the website validated email address imposes validation rules that are not required in the target system. To this end, I have obtained the regular expression from the developer, and am requiring some assistance in translating it to allow it to be understood in plain English:
^[_A-Za-z0-9_%+-]+(\\.[_A-Za-z0-9_%+-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,4})$
So far, I have gained some understanding from a previous post.
... which would seem to confirm the following:
^
= The matched string must begin here, and only begin here
[ ]
= match any character inside the brackets, but only match one.
I'm not sure of the relevance of "only match one". Can anyone advise?
\+
= match previous expression at least once, unlimited number of times.
Presumably this means the previous expression refers to the characters contained within the preceding square brackets and it can be repeated unlimited times?
()
= make everything inside the parentheses a group (and make them referencable).
I'm not sure what this might mean.
\\.
= match a literal full stop (.
)
Then we have a repeat of the square bracket content, though I'm unsure what the relevance is here since the initial square brackets character class can be repeated unlimited times?
@
= match a literal @
symbol
The final parenthesis seems to match the top level domain which must be at least 2 characters but no more than 4 characters.
I think my main issue is in understanding the relevance of the round brackets as I can't understand what they add beyond what the square brackets add.
Any help would be much appreciated.