Regex split email address

Question

I need some help with php regex, I want to "split" email address "johndoe@example.com" to "johndoe" and "@example.com"

Until now I have this: preg_match('/<?([^<]+?)@/', 'johndoe@example.com', $matches); And I get Array ( [0] => johndoe@ [1] => johndoe)

So how I need to change regex?

domain one-liner: `$domain = substr($email, strrpos($email, '@')+1);` — Kamil Kiełczewski, Jul 26 '17 at 18:58
Any chance you would consider changing your accepted answer? — Brogan, Oct 12 '19 at 23:31

score 29 · Accepted Answer · answered Jul 27 '11 at 20:37

29

$parts = explode('@', "johndoe@example.com");

$user = $parts[0];
// Stick the @ back onto the domain since it was chopped off.
$domain = "@" . $parts[1];

answered Jul 27 '11 at 20:37

Michael Berkowski

267,341
46
444
390

4

Better yet, `list($name,$_) = explode('@',$email); $domain = '@'.$_;` - http://www.ideone.com/yHlz6 – Brad Christie Jul 27 '11 at 20:39
1

Exactly what I was thinking. And it is less expensive processing-wise to do an explode. – Patrick Jul 27 '11 at 20:40
@Brad Christie after reading your Perl-looking comment, for a second I thought I interpreted a Perl question as a PHP question :) – Michael Berkowski Jul 27 '11 at 20:40
@Michael: Just keeping people guessing. I use `$_` in temporary solutions. For good, bad or indifferent (and at the risk of readability), it makes writing the variable out faster. And, until my notepad gets intellisense for PHP coding, I'm probably going to continue doing so. ;p – Brad Christie Jul 27 '11 at 20:53
2

An email address can have multiple "@" symbols, as stated in http://stackoverflow.com/questions/12355858/how-many-symbol-can-be-in-an-email-address – xDaizu Jun 15 '16 at 11:17

Brogan · Answer 2 · 2019-10-30T02:19:39.667

Some of the previous answers are wrong, as a valid email address can, in fact, include more than a single @ symbol by containing it within dot delimited, quoted text. See the following example:

$email = 'a."b@c".d@e.f';
echo (filter_var($email, FILTER_VALIDATE_EMAIL) ? 'V' : 'Inv'), 'alid email format.';

Valid email format.

Multiple delimited blocks of text and a multitude of @ symbols can exist. Both of these examples are valid email addresses:

$email = 'a."b@c".d."@".e.f@g.h';
$email = '/."@@@@@@"./@a.b';

Based on Michael Berkowski's explode answer, this email address would look like this:

$email = 'a."b@c".d@e.f';
$parts = explode('@', $email);
$user = $parts[0];
$domain = '@' . $parts[1];

User: a."b"
Domain: @c".d

Anyone using this solution should beware of potential abuse. Accepting an email address based on these outputs, followed by inserting $email into a database could have negative implications.

$email = 'a."b@c".d@INSERT BAD STUFF HERE';

The contents of these functions are only accurate so long as filter_var is used for validation first.

From the left:

Here is a simple non-regex, non-exploding solution for finding the first @ that is not contained within delimited and quoted text. Nested delimited text is considered invalid based on filter_var, so finding the proper @ is a very simple search.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    $local = substr($email, 0, $i);
    $domain = substr($email, $i);
}

Here is the same code tucked inside a function.

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    return array('local' => substr($email, 0, $i), 'domain' => substr($email, $i));
}

In use:

$email = 'a."b@c".x."@".d.e@f.g';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Array ( [local] => a."b@c".x."@".d.e [domain] => @f.g )

$email = 'a."b@c".x."@".d.e@f.g@';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Bad email address.

From the right:

After doing some testing of filter_var and researching what is acceptable as a valid domain name (Hostnames separated by dots), I created this function to get a better performance. In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $domain = strrpos($email, '@');
    $local = substr($email, 0, $domain);
    $domain = substr($email, $domain);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = strrpos($email, '@');
    return array('local' => substr($email, 0, $a), 'domain' => substr($email, $a));
}

Or using explode and implode:

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = explode('@', $email);
    $domain = '@' . array_pop($local);
    $local = implode('@', $local);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = explode('@', $email);
    $domain = '@' . array_pop($email);
    return array('local' => implode('@', $email), 'domain' => $domain);
}

If you would still like to use regex, splitting the string starting from the end of a valid email address is the safest option.

/(.*)(@.*)$/

(.*) Matches anything.
(@.*) Matches anything that begins with an @ symbol.
$ End of the string.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    $domain = $local[1];
    $local = $local[0];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    return array('local' => $email[0], 'domain' => $email[1]);
}

Or

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    preg_match('/(.*)(@.*)$/', $email, $matches);
    $local = $matches[1];
    $domain = $matches[2];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    preg_match('/(.*)(@.*)$/', $email, $matches);
    return array('local' => $matches[1], 'domain' => $matches[2]);
}

I have never seen an email address that is actually in use that contains multiple @ symbols. My original email address from 1988 had a # in it to signify routing information. I was surprised when I read the RFC on email to find out what was valid and what wasn't. Still, there are things called valid that I've never seen in production. — frumbert, Aug 24 '17 at 01:16

score 3 · Answer 3 · answered Jul 27 '11 at 20:46

3

Using explode is probably the best approach here, but to do it with regex you would do something like this:

/^([^@]*)(@.*)/

^ start of string

([^@]*) anything that is not an @ symbol ($matches[0])

(@.*) @ symbol followed by anything ($matches[1])

answered Jul 27 '11 at 20:46

middric

376
2
7

score 2 · Answer 4 · edited May 23 '17 at 11:46

2

Answer

$parts = explode("@", $email);
$domain = array_pop($parts);
$name = implode("@",$parts);

This solves both Brogan's edge cases (a."b@c".d."@".e.f@g.hand /."@@@@@@"./@a.b) as you can see in this Ideone

The currently accepted answer is not valid because of the multiple "@" case.

I loved @Brogan's answer until I read his last sentence:

In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

That is supported by this other answer. And if that's true, his answer seems unnecessarily complex.

edited May 23 '17 at 11:46

Community

1
1

answered Jun 15 '16 at 12:10

xDaizu

1,051
1
12
29

1

..downvotes? Why? I don't know what's wrong with my solution... it works as far as I can tell. If it doesn't, please, tell me why! D: – xDaizu Jun 17 '16 at 08:31

score 0 · Answer 5 · answered Jan 25 '17 at 13:57

0

Use regular expression. For example:

$mailadress = "email@company.com";     
$exp_arr= preg_match_all("/(.*)@(.*)\.(.*)/",$mailadress,$newarr, PREG_SET_ORDER); 

/*
Array output:
Array
(
    [0] => Array
        (
            [0] => email@company.com
            [1] => email
            [2] => company
            [3] => com
        )

)
*/

answered Jan 25 '17 at 13:57

umutkeskin

128
2
3

This matches `@@@@@@@@@.` that is invalid but not `me@localhost` that is valid. – Toto Jan 25 '17 at 16:34

score 0 · Answer 6 · answered Jul 27 '11 at 20:58

0

If you want a preg_match solution, you could also do something like this

preg_match('/([^<]+)(@[^<]+)/','johndoe@example.com',$matches);

answered Jul 27 '11 at 20:58

m4rinos

458
4
9

score -1 · Answer 7 · edited Oct 07 '21 at 13:58

I've created a general regex for this that validates and creates named captures of the full email, the user, and the domain.

Regex:

(?<email>(?<mailbox>(?:\w|[!#$%&'*+/=?^`{|}~-])+(?:\.(?:\w|[!#$%&'*+/=?^`{|}~-])+)*)@(?<full_domain>(?<subdomains>(?:(?:[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.)*)(?<root_domain>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.(?<tld>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)))

Explanation:

(?<email>                          #  start Full Email capture
  (?<mailbox>                      #    Mailbox
    (?:\w|[!#$%&'*+/=?^`{|}~-])+   #      letter, number, underscore, or any of these special characters
    (?:                            #      Group: allow . in the middle of mailbox; can have multiple but can't be consecutive (no john..smith)
      \.                           #        match "." 
      (?:\w|[!#$%&'*+/=?^`{|}~-])+ #        letter, number, underscore, or any of these special characters
    )*                             #      allow one letter mailboxes
  )                                #    close Mailbox capture
  @                                #    match "@"
  (?<full_domain>                  #    Full Domain (including subdomains and tld)
    (?<subdomains>                 #      All Subdomains
      (?:                          #        label + '.' (so we can allow 0 or more)
        (?:                        #          label text
          [^\W\d_]                 #            start with a letter (\W is the inverse of \w so we end up with \w minus numbers and _)
          (?:                      #            paired with a ? to allow single letter domains
            (?:[^\W_]|-)+          #              allow letters, numbers, hyphens, but not underscore
            [^\W_]                 #              if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
          )?                       #            allow one letter sub domains
        )                          #          end label text
      \.)*                         #        allow 0 or more subdomains separated by '.'
    )                              #      close All Subdomains capture
    (?<root_domain>                #      Root Domain
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
      )?                           #        allow one letter domains
    )                              #      close Root Domain capture
    \.                             #      separator
    (?<tld>                        #      TLD
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen)
      )?                           #        allow single letter tld
    )                              #      close TLD capture
  )                                #    close Full Domain capture
)                                  #  close Full Email capture

Notes

Generalized Regex: I've posted JUST the regex search itself not the php exclusive stuff. This is to make it easier to use for other people who find it based on the name "Regex Split Email Address".

Feature Compatibility: Not all regex processors support Named Captures, if you have trouble with it test it with your text on Regexr (checking the Details to see the captures). If it works there then double check if the regex engine you're using supports named captures.

Domain RFC: The domain part is also based on the domain RFC not just 2822

Dangerous Characters: I have explicitly included '$! etc to both make it clear these are allowed by the mail RFC and to make it easy to remove if a particular set of characters should be disallowed in your system due to special processing requirements (like blocking of possible sql injection attacks)

No Escape: for the mailbox name I've only included dot-atom format, I've intentionally excluded dot or slash escaped support

Subtle Letters: For some parts I've used [^\W\d_] instead of [a-zA-Z] to improve support for languages other than english.

Out of Bounds: Due to idiosyncrasies in capture group processing in some systems I've used + in place of {,61}. If you're using it someplace that might be vulnerable to buffer overflow attacks remember to bound your inputs

Credits: Modified from community post by Tripleaxis, which was in turn taken from the .net helpfiles

Note: the -1 was from another answerer who was confused why I included `'$!`. I have updated the answer to be more clear why they're there. Please feel free to review and rate yourself if you find it useful. — Chris Rudd, Apr 20 '21 at 21:32

Regex split email address

7 Answers7

From the left:

From the right:

Answer

Notes

Linked