0

I appreciate there are several email regexs on SO but couldn't find anything that would suits my case.

we have a email system that is failing with this regex:

 if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) {
        $domain_array = explode(".", $email_array[1]);
        if (sizeof($domain_array) < 2) {
            $this->result = 0;
        }
        for ($i = 0; $i < sizeof($domain_array); $i++) {
          if
(!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|?([A-Za-z0-9]+))$",
    $domain_array[$i])) {
            $this->result = 0;
          }
        }
      }

trying to email at an email address in the format:

my.name@some-text.value.subdomain.domain.co.uk

i assume it's the extra .value. that is causing the problem and i'm not very experienced with regex to fix this. can anyone help?

the regex..

^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|?([A-Za-z0-9]+))$

thanks in advance.

  • 3
    Please see http://stackoverflow.com/questions/201323/how-to-use-a-regular-expression-to-validate-an-email-addresses. Don't use a regex to validate an email address. – Madara's Ghost Apr 13 '12 at 08:25
  • it is required for a fix for a live system, don't have the opportunity to change too much at this stage. –  Apr 13 '12 at 08:27
  • ereg is deprecated: http://php.net/manual/en/function.ereg.php – Toto Apr 14 '12 at 09:38
  • what about valid emails like: `jean+françois@anydomain.tld` – Toto Apr 14 '12 at 09:41
  • the regex needs to cater for any email address. but the current code in place takes the addresse and domain separately. –  Apr 15 '12 at 12:34

4 Answers4

1

If you are looking at email validation you should resort to the filter_var() function which works better than a regex imho.

filter_var('my.name@some-text.value.domain.co.uk', FILTER_VALIDATE_EMAIL);
ChrisR
  • 14,370
  • 16
  • 70
  • 107
1

Or you could use this regex:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$ 

It matches your given email and works with most of them (99.99% or so)

Dovydas Navickas
  • 3,533
  • 1
  • 32
  • 48
1

The regex:

^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|?([A-Za-z0-9]+))$

should match individual parts of a domain name. Right?

So if you have some-text.value.subdomain.domain.co.uk as the domain name.

Your code splits by the dot and tries to match each sub part.

So for instance some-text or subdomain.

This would work fine with the above regex; just the ? after | is distracting for the regex engine.

I would try

^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|([A-Za-z0-9]+))$

instead.

Though it is still not a good regex to match individual parts of a domain name. But I guess you want to change as little as possible in the code base.

A better one should be just the last part of the alternation

^[A-Za-z0-9-]+$
omat
  • 26
  • 2
0

I wonder if you could use a list of TLD's or allowed TLD's in your regex near the end - to match something like xn--hgbk6aj7f53bba and some of the newer TLD's?

In this case a regex like this ^([a-zA-Z0-9_.\'+-])+\@([a-zA-Z0-9-])+(\.(ac|ad|aero|ae|af|ag|ai|al|am|an|ao|aq|arpa|ar|asia|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|biz|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|cat|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|coop|com|co|cr|cu|cv|cx|cy|cz|de|dj|dk|dm|do|dz|ec|edu|ee|eg|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|info|int|in|io|iq|ir|is|it|je|jm|jobs|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mil|mk|ml|mm|mn|mobi|mo|mp|mq|mr|ms|mt|museum|mu|mv|mw|mx|my|mz|name|na|nc|net|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pro|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tel|tf|tg|th|tj|tk|tl|tm|tn|to|tp|travel|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|xn--0zwm56d|xn--11b5bs3a9aj6g|xn--80akhbyknj4f|xn--9t4b11yi5a|xn--deba0ad|xn--g6w251d|xn--hgbk6aj7f53bba|xn--hlcj6aya9esc7a|xn--jxalpdlp|xn--kgbechtv|xn--zckzah|ye|yt|yu|za|zm|zw))+$

Should work - sorry for the messed up formatting.

agamemnon
  • 33
  • 3