13

I need some help with php regex, I want to "split" email address "johndoe@example.com" to "johndoe" and "@example.com"

Until now I have this: preg_match('/<?([^<]+?)@/', 'johndoe@example.com', $matches); And I get Array ( [0] => johndoe@ [1] => johndoe)

So how I need to change regex?

morandi3
  • 1,095
  • 3
  • 14
  • 27

7 Answers7

29
$parts = explode('@', "johndoe@example.com");

$user = $parts[0];
// Stick the @ back onto the domain since it was chopped off.
$domain = "@" . $parts[1];
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • 4
    Better yet, `list($name,$_) = explode('@',$email); $domain = '@'.$_;` - http://www.ideone.com/yHlz6 – Brad Christie Jul 27 '11 at 20:39
  • 1
    Exactly what I was thinking. And it is less expensive processing-wise to do an explode. – Patrick Jul 27 '11 at 20:40
  • @Brad Christie after reading your Perl-looking comment, for a second I thought I interpreted a Perl question as a PHP question :) – Michael Berkowski Jul 27 '11 at 20:40
  • @Michael: Just keeping people guessing. I use `$_` in temporary solutions. For good, bad or indifferent (and at the risk of readability), it makes writing the variable out faster. And, until my notepad gets intellisense for PHP coding, I'm probably going to continue doing so. ;p – Brad Christie Jul 27 '11 at 20:53
  • 2
    An email address can have multiple "@" symbols, as stated in http://stackoverflow.com/questions/12355858/how-many-symbol-can-be-in-an-email-address – xDaizu Jun 15 '16 at 11:17
9

Some of the previous answers are wrong, as a valid email address can, in fact, include more than a single @ symbol by containing it within dot delimited, quoted text. See the following example:

$email = 'a."b@c".d@e.f';
echo (filter_var($email, FILTER_VALIDATE_EMAIL) ? 'V' : 'Inv'), 'alid email format.';

Valid email format.


Multiple delimited blocks of text and a multitude of @ symbols can exist. Both of these examples are valid email addresses:

$email = 'a."b@c".d."@".e.f@g.h';
$email = '/."@@@@@@"./@a.b';

Based on Michael Berkowski's explode answer, this email address would look like this:

$email = 'a."b@c".d@e.f';
$parts = explode('@', $email);
$user = $parts[0];
$domain = '@' . $parts[1];

User: a."b"
Domain: @c".d


Anyone using this solution should beware of potential abuse. Accepting an email address based on these outputs, followed by inserting $email into a database could have negative implications.

$email = 'a."b@c".d@INSERT BAD STUFF HERE';

The contents of these functions are only accurate so long as filter_var is used for validation first.

From the left:

Here is a simple non-regex, non-exploding solution for finding the first @ that is not contained within delimited and quoted text. Nested delimited text is considered invalid based on filter_var, so finding the proper @ is a very simple search.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    $local = substr($email, 0, $i);
    $domain = substr($email, $i);
}

Here is the same code tucked inside a function.

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = '"';
    $b = '.';
    $c = '@';
    $d = strlen($email);
    $contained = false;
    for($i = 0; $i < $d; ++$i) {
        if($contained) {
            if($email[$i] === $a && $email[$i + 1] === $b) {
                $contained = false;
                ++$i;
            }
        }
        elseif($email[$i] === $c)
            break;
        elseif($email[$i] === $b && $email[$i + 1] === $a) {
            $contained = true;
            ++$i;
        }
    }
    return array('local' => substr($email, 0, $i), 'domain' => substr($email, $i));
}

In use:

$email = 'a."b@c".x."@".d.e@f.g';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Array ( [local] => a."b@c".x."@".d.e [domain] => @f.g )

$email = 'a."b@c".x."@".d.e@f.g@';
$email = parse_email($email);
if($email !== false)
    print_r($email);
else
    echo 'Bad email address.';

Bad email address.


From the right:

After doing some testing of filter_var and researching what is acceptable as a valid domain name (Hostnames separated by dots), I created this function to get a better performance. In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $domain = strrpos($email, '@');
    $local = substr($email, 0, $domain);
    $domain = substr($email, $domain);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $a = strrpos($email, '@');
    return array('local' => substr($email, 0, $a), 'domain' => substr($email, $a));
}

Or using explode and implode:

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = explode('@', $email);
    $domain = '@' . array_pop($local);
    $local = implode('@', $local);
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = explode('@', $email);
    $domain = '@' . array_pop($email);
    return array('local' => implode('@', $email), 'domain' => $domain);
}

If you would still like to use regex, splitting the string starting from the end of a valid email address is the safest option.

/(.*)(@.*)$/

(.*) Matches anything.
(@.*) Matches anything that begins with an @ symbol.
$ End of the string.

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    $local = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    $domain = $local[1];
    $local = $local[0];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    $email = preg_split('/(.*)(@.*)$/', $email, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    return array('local' => $email[0], 'domain' => $email[1]);
}

Or

if(filter_var($email, FILTER_VALIDATE_EMAIL)) {
    preg_match('/(.*)(@.*)$/', $email, $matches);
    $local = $matches[1];
    $domain = $matches[2];
}

As a function:

function parse_email($email) {
    if(!filter_var($email, FILTER_VALIDATE_EMAIL)) return false;
    preg_match('/(.*)(@.*)$/', $email, $matches);
    return array('local' => $matches[1], 'domain' => $matches[2]);
}
Brogan
  • 708
  • 5
  • 13
  • 1
    I have never seen an email address that is actually in use that contains multiple @ symbols. My original email address from 1988 had a # in it to signify routing information. I was surprised when I read the RFC on email to find out what was valid and what wasn't. Still, there are things called valid that I've never seen in production. – frumbert Aug 24 '17 at 01:16
3

Using explode is probably the best approach here, but to do it with regex you would do something like this:

/^([^@]*)(@.*)/

^ start of string

([^@]*) anything that is not an @ symbol ($matches[0])

(@.*) @ symbol followed by anything ($matches[1])

middric
  • 376
  • 2
  • 7
2

Answer

$parts = explode("@", $email);
$domain = array_pop($parts);
$name = implode("@",$parts);

This solves both Brogan's edge cases (a."b@c".d."@".e.f@g.hand /."@@@@@@"./@a.b) as you can see in this Ideone


The currently accepted answer is not valid because of the multiple "@" case.

I loved @Brogan's answer until I read his last sentence:

In a valid email address, the last @ should be the true @, as the @ symbol should never appear in the domain of a valid email address.

That is supported by this other answer. And if that's true, his answer seems unnecessarily complex.

Community
  • 1
  • 1
xDaizu
  • 1,051
  • 1
  • 12
  • 29
  • 1
    ..downvotes? Why? I don't know what's wrong with my solution... it works as far as I can tell. If it doesn't, please, tell me why! D: – xDaizu Jun 17 '16 at 08:31
0

Use regular expression. For example:

$mailadress = "email@company.com";     
$exp_arr= preg_match_all("/(.*)@(.*)\.(.*)/",$mailadress,$newarr, PREG_SET_ORDER); 

/*
Array output:
Array
(
    [0] => Array
        (
            [0] => email@company.com
            [1] => email
            [2] => company
            [3] => com
        )

)
*/
umutkeskin
  • 128
  • 2
  • 3
0

If you want a preg_match solution, you could also do something like this

preg_match('/([^<]+)(@[^<]+)/','johndoe@example.com',$matches);
m4rinos
  • 458
  • 4
  • 9
-1

I've created a general regex for this that validates and creates named captures of the full email, the user, and the domain.

Regex:

(?<email>(?<mailbox>(?:\w|[!#$%&'*+/=?^`{|}~-])+(?:\.(?:\w|[!#$%&'*+/=?^`{|}~-])+)*)@(?<full_domain>(?<subdomains>(?:(?:[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.)*)(?<root_domain>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)\.(?<tld>[^\W\d_](?:(?:[^\W_]|-)+[^\W_])?)))

Explanation:

(?<email>                          #  start Full Email capture
  (?<mailbox>                      #    Mailbox
    (?:\w|[!#$%&'*+/=?^`{|}~-])+   #      letter, number, underscore, or any of these special characters
    (?:                            #      Group: allow . in the middle of mailbox; can have multiple but can't be consecutive (no john..smith)
      \.                           #        match "." 
      (?:\w|[!#$%&'*+/=?^`{|}~-])+ #        letter, number, underscore, or any of these special characters
    )*                             #      allow one letter mailboxes
  )                                #    close Mailbox capture
  @                                #    match "@"
  (?<full_domain>                  #    Full Domain (including subdomains and tld)
    (?<subdomains>                 #      All Subdomains
      (?:                          #        label + '.' (so we can allow 0 or more)
        (?:                        #          label text
          [^\W\d_]                 #            start with a letter (\W is the inverse of \w so we end up with \w minus numbers and _)
          (?:                      #            paired with a ? to allow single letter domains
            (?:[^\W_]|-)+          #              allow letters, numbers, hyphens, but not underscore
            [^\W_]                 #              if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
          )?                       #            allow one letter sub domains
        )                          #          end label text
      \.)*                         #        allow 0 or more subdomains separated by '.'
    )                              #      close All Subdomains capture
    (?<root_domain>                #      Root Domain
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen or underscore)
      )?                           #        allow one letter domains
    )                              #      close Root Domain capture
    \.                             #      separator
    (?<tld>                        #      TLD
      [^\W\d_]                     #        start with a letter
      (?:                          #        paired with ? to make characters after the first optional
        (?:[^\W_]|-)+              #          allow letters, numbers, hyphens
        [^\W_]                     #          if domain is more than one character, it has to end with a letter or digit (not a hyphen)
      )?                           #        allow single letter tld
    )                              #      close TLD capture
  )                                #    close Full Domain capture
)                                  #  close Full Email capture

Notes

Generalized Regex: I've posted JUST the regex search itself not the php exclusive stuff. This is to make it easier to use for other people who find it based on the name "Regex Split Email Address".

Feature Compatibility: Not all regex processors support Named Captures, if you have trouble with it test it with your text on Regexr (checking the Details to see the captures). If it works there then double check if the regex engine you're using supports named captures.

Domain RFC: The domain part is also based on the domain RFC not just 2822

Dangerous Characters: I have explicitly included '$! etc to both make it clear these are allowed by the mail RFC and to make it easy to remove if a particular set of characters should be disallowed in your system due to special processing requirements (like blocking of possible sql injection attacks)

No Escape: for the mailbox name I've only included dot-atom format, I've intentionally excluded dot or slash escaped support

Subtle Letters: For some parts I've used [^\W\d_] instead of [a-zA-Z] to improve support for languages other than english.

Out of Bounds: Due to idiosyncrasies in capture group processing in some systems I've used + in place of {,61}. If you're using it someplace that might be vulnerable to buffer overflow attacks remember to bound your inputs

Credits: Modified from community post by Tripleaxis, which was in turn taken from the .net helpfiles

Community
  • 1
  • 1
Chris Rudd
  • 709
  • 7
  • 13
  • Note: the -1 was from another answerer who was confused why I included `'$!`. I have updated the answer to be more clear why they're there. Please feel free to review and rate yourself if you find it useful. – Chris Rudd Apr 20 '21 at 21:32