3

I know there are thousands of questions regarding regex like using a regular expression to validate an email address and regular expression to match single dot but not two dots.

I created an regex as below, which is accepting '(apostrophe) and .(dot).

/^[\w-\.\']{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}$/

But it will accept continuous double dots and continuous double apostrophe also. How do I prevent it?

E.g:

john's.presonal@somedomain.com is correct.

john's..presonal@somedomain.com is in-correct.

john's.presonal.email@somedomain.com is correct.

My Fiddle Here

I understand looking into before hyperlink that I need to use '/^([^\.]|([^\.])\.[^\.])*$/', but not sure how do I create my reg-ex!

General Grievance
  • 4,555
  • 31
  • 31
  • 45
  • You might be better to parse the email address twice and require it to match both regexes. Otherwise your regex is going to quickly descend into madness. – ydaetskcoR May 29 '14 at 10:03
  • @ydaetskcoR: the validation is in server side where it is an automated framework which validates `Object's Properties` and `Validation Reg-ex` associated with it. :( –  May 29 '14 at 10:06

2 Answers2

5

Just add this negative lookahead just after your ^

(?!.*(?:''|\.\.))

How does this work?

(?!.*(?:''|\.\.)) is a negative lookahead that asserts: at the present position (which is the beginning of the string), we cannot match any character followed by either two apostrophes or two dots.

Other tweaks

Since that is not the question, I haven't analyzed the rest of your regex. However, at a glance:

  1. {1,} can just be written as +
  2. Your initial [\w-\.\'] means that an email can start with a dot (among other characters). Are you sure that is valid? If not, start your match with exactly one character from the allowable set, then only add the quantified set.
  3. The {2,3} at the end is okay for TLDs such as com and us. But are you sure you want to exclude TLDs such as mobi?

The Wheel

For reference, here are examples of "the wheel" that has already been invented. These are two email expressions from the RegexBuddy library.

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b

RFC2822:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
zx81
  • 41,100
  • 9
  • 89
  • 105
  • This was my first reg-ex of my life :) Thanks for pointing it out. Will update my regex :). It works as of now that's magical for me. Thanks :) –  May 29 '14 at 10:15
  • @Idothisallday Wow, if this is your first regex ever, you're doing great! You'll be a pro in no time. Thanks for your feedback. FYI, added two regexes from the RB library for reference, but it sounds like you want to do your own thing. :) – zx81 May 29 '14 at 10:17
  • I was actually writing it in [regex101](http://regex101.com/) where these guys highlighted that `\w match any word character [a-zA-Z0-9_]`. Rest I was picking up after little googling :P –  May 29 '14 at 10:19
  • My final regex is as you suggested `/^(?!.*(?:''|\.\.))[\w-\.\']+\@([\da-zA-Z-]{1,}\.)+[\da-zA-Z-]{2,5}$/`. I have tried your suggestion 1 and 3. But Not sure how to try for 2. I am not getting that work. Can you please help. Is this wrong? `/^(?!.*(?:''|\.\.))[A-Z0-9._%+-]+\@([\da-zA-Z-]{1,}\.)+[\da-zA-Z-]{2,5}$/` –  May 29 '14 at 11:31
  • Is this correct `/^(?!.*(?:''|\.\.))[A-Za-z0-9-\.\'\-\_]+\@([\da-zA-Z-]{1,}\.)+[\da-zA-Z-]{2,5}$/` –  May 29 '14 at 11:36
  • @Idothisallday Briefly, some small corrections on the regex in your comments `(?i)^(?!.*(?:''|\.\.))[a-z0-9][-a-z0-9.'_]+@([-a-z0-9]+\.)+[a-z]{2,5}$` The `(?i)` saves `A-Z` space by going to case-insensitive. The `{1,}` is replaced by a `+` The initial `[a-z0-9]` ensures that your first character is not a dot etc (if you want that) The `\d` in .NET can be Arabic or Thai digits so I replaced it with `0-9`, also removed it from the TLD. Moved up the `-` at the front of the character classes for clarity, also the `.'-_` dont need to be escaped in a charclass so removed `\`. You can tweak forever! – zx81 May 29 '14 at 19:54
1

A regex to match every valid email is very complicated.

You should use something like filter_var($email, FILTER_VALIDATE_EMAIL) in PHP or [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:&error] in Objective-C.

src
  • 205
  • 1
  • 8