1

I'd like to do progressive validation of an email address. By that I mean I would like to validate, as a string is being provided one character at a time, whether or not the current string represents a valid beginning of an email address.

Note that I'm aware of this and other similar answers that provide excellent patterns for matching a complete email address. What I'm looking for is slightly different.

I'd like to know, given a regex pattern, say the below pattern as described in the link above, if there's a general way to say if a given string represents a valid beginning of the pattern.

/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

I understand that I can manually decompose the above pattern into composite sections and "OR" them together for longer pattern captures from the beginning of the main pattern forward, but I'm hoping there's something a little more elegant and/or a little less verbose that could just reference the established pattern as a capture group and look inside for a partial matches of the beginning only. Is this possible to achieve with regular expressions?

Strings the regex would match:

  • ""
  • "test"
  • "test.user"
  • "test.user."
  • "test.user.1@"
  • "test.user.1@test"
  • "test.user.1@test.best"
  • "test.user.1@test.best.com"

Strings the regex would not match:

  • "@#$@#"
  • "test.."
  • "test.user.@"
  • "test.user.1@@"
  • "test.user.1@test..best"
  • "test.user.1@test.best@"
jdmcnair
  • 1,305
  • 15
  • 33
  • This is tough with a regex, but easy with a *parser*. Start parsing the string, and throw an exception or such once you encounter a syntax error. If the parser simply falls off the end without error, it's okay. Obviously you'd need to have separate return values for "not invalid but incomplete" and "complete and valid" too. – deceze Nov 13 '18 at 00:20

1 Answers1

5

I'm simplifying the email pattern to /^[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+$/ to simplify the step of the method.

  1. You identify a scenario for entering a valid value with a pattern:
Value             regex                                     matches
----------------- ----------------------------------------- ---------
t                 /^[a-z]+/                                 Yes
to                /^[a-z]+/                                 Yes
tom               /^[a-z]+/                                 Yes
tom.              /^[a-z]+\./                               Yes
tom.e             /^[a-z]+(\.[a-z]+)*/                      Yes
tom.ed            /^[a-z]+(\.[a-z]+)*/                      Yes
tom.ed@           /^[a-z]+(\.[a-z]+)*@/                     Yes
tom.ed@i          /^[a-z]+(\.[a-z]+)*@[a-z]+/               Yes
tom.ed@in         /^[a-z]+(\.[a-z]+)*@[a-z]+/               Yes
tom.ed@int        /^[a-z]+(\.[a-z]+)*@[a-z]+/               Yes
tom.ed@inte       /^[a-z]+(\.[a-z]+)*@[a-z]+/               Yes
tom.ed@inter      /^[a-z]+(\.[a-z]+)*@[a-z]+/               Yes
tom.ed@inter.     /^[a-z]+(\.[a-z]+)*@[a-z]+\./             Yes 
tom.ed@inter.n    /^[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+/    Yes
tom.ed@inter.ne   /^[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+/    Yes
tom.ed@inter.net  /^[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+$/   Yes
  1. Then you look for ways to combine a pattern that only match 1 line with the previous pattern.

    • For /^[a-z]+\./, we can combine it with /^[a-z]+/ and obtain /^[a-z]+\.?/.
    • For /^[a-z]+(\.*[a-z]+)*@/, we can combine it with /^[a-z]+(\.*[a-z]+)*/ and obtain /^[a-z]+(\.*[a-z]+)*@?/.
    • For /^[a-z]+(\.*[a-z]+)*@[a-z]+\./, we can combine it with /^[a-z]+(\.*[a-z]+)*@[a-z]+/ and obtain /^[a-z]+(\.*[a-z]+)*@[a-z]+\.?/.
    • ...

You continue until you are satisfied that you have covered all possible case.

  1. You put everything together using this /^( ... | ... | ... | ... )/. Here a possible regex you would obtain :

/^([a-z]+\.?|[a-z]+(\.[a-z]+)*\.?|[a-z]+(\.[a-z]+)*@|[a-z]+(\.[a-z]+)*@[a-z]+\.?|[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)*\.?|[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+)$/

Her is a better view :

( [a-z]+\.?
| [a-z]+(\.[a-z]+)*\.?
| [a-z]+(\.[a-z]+)*@
| [a-z]+(\.[a-z]+)*@[a-z]+\.?
| [a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)*\.?
| [a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+ )
  1. Last you test to see if it will allows unwanted values

You can do that at regex101.com or in a test page.

Note: This pattern is only good while the user enters a value. Before submitting, the original pattern (/^[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+$/) must be used to validate the value so that no incomplete values are sent to the server.

EDIT : To make it more readable and maintainable, you can do

var patt1 = "[a-z]+\.?";
var patt2 = "[a-z]+(\.[a-z]+)*@";
var patt3 = "[a-z]+(\.[a-z]+)*@[a-z]+\.?";
var patt4 = "[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)*\.?";
var patt5 = "[a-z]+(\.[a-z]+)*@[a-z]+(\.[a-z]+)+";

var keypressedValidator = new RegExp("^("+patt1+"|"+"+patt2+"|"+"+patt3+"|"+"+patt4+"|"+"+patt5+")$");

...

var inputValue = document.getElementById(...some Id...).value;

if ( ! inputValue.match(keypressedValidator)) {
  ... show error status or error message ...
...
Dominique Fortin
  • 2,212
  • 15
  • 20
  • This is pretty much that "manually decompose the above pattern into composite sections and "OR" them together" option I was talking about above. I think it will work, but it'll also be extremely verbose and difficult to maintain for anything aside from the most simplified notion of an email address pattern. I'm asking for something a little more specific; is there a way to take an original pattern, make it a capture group, and use some kind of self-referential operation to determine if a given string is a valid `beginning` of the original pattern. – jdmcnair Nov 13 '18 at 15:26
  • Note that it's OK and not altogether unexpected if the answer is "no, that doesn't exist", but I'm going for something fairly specific. – jdmcnair Nov 13 '18 at 15:54
  • @jdmcnair I'v added a code example to make it more readable and maintainable. – Dominique Fortin Nov 13 '18 at 21:24