0

I have a database that contains a collection of text email addresses that need to be parsed. They can be in one of two formats in the "ToAddress:" field

1) jeff <jeff@store.com>; john <john@bbaft.com>; joe@company.com; jj@abc.info; jamie <jam@sub.abc.com>
or 
2) james@company.com

I need the addresses parsed into a PHP array with both parts name and email even if there is no name.

I have been partially successful with the following, but it seems to be broken when there is no name or when there are no "<" surrounding the email. Would love some advice on how to fix it.

  $emails = array();
  $e = array();
  if(preg_match_all('/\s*"?([^><,"]+)"?\s*((?:<[^><,]+>)?)\s*/', $vToAddr, $matches, PREG_SET_ORDER) > 0)
  {
      foreach($matches as $m)
      {
          if(! empty($m[2]))
          {
              $emails= array("email" => trim($m[2], '<>'), "name" => trim($m[1],';'));
          }
          else
          {
              $emails= array("email" => trim($m[2], '<>'), "name" => "");
          }
          array_push($e,$emails);
      }
  } 

I looked into How to parse formatted email address into display name and email address? where I got the original RegEx from, but it fails with the example above.

JustJeffy
  • 97
  • 7

2 Answers2

0

The code above was generating the following value for $e:

array(3) {
  [0]=> array(2) {
    ["email"]=> string(14) "jeff@store.com"
    ["name"]=> string(5) "jeff "
  }
  [1]=> array(2) {
    ["email"]=> string(14) "john@bbaft.com"
    ["name"]=> string(6) " john "
  }
  [2]=> array(2) {
    ["email"]=> string(15) "jam@sub.abc.com"
    ["name"]=> string(37) " joe@company.com; jj@abc.info; jamie "
  }
}

And it looks like the issue is in RegEx taking joe@company.com; jj@abc.info; jamie as a name (instead of taking just jamie and treating emails treated separately).

I'd suggest splitting the list of emails by ';' separator using explode() before parsing each individual email with the RegEx.

$parts = explode(";", $vToAddr);

Else clause (when $m[2] is empty) was incorrect because in this case $m[1] contains the email without <> brackets, so no trim() was needed.

$m[0] was " jj@abc.info"
$m[1] was "jj@abc.info"
$m[2] was ""

Here's the resulting code:

<?php
$vToAddr = "jeff <jeff@store.com>; john <john@bbaft.com>; joe@company.com; jj@abc.info; jamie <jam@sub.abc.com>";

$e = array();
$parts = explode(";", $vToAddr);

foreach ($parts as $p) {
    $emails = array();
    if (preg_match_all('/\s*"?([^><,"]+)"?\s*((?:<[^><,]+>)?)\s*/', $p, $matches, PREG_SET_ORDER) > 0) {
        foreach ($matches as $m) {
            if (!empty($m[2])) {
                $emails = array(
                    "email" => trim($m[2], '<>'),
                    "name" => trim($m[1]) // remove spaces around names
                );
            } else {
                $emails = array(
                    "email" => $m[1],
                    "name" => ""
                );
            }
            array_push($e, $emails);
        }
    }
}

var_dump($e);
Vitalii
  • 2,071
  • 1
  • 4
  • 5
0

Another option is to use a pattern with named capturing groups and the J modifier to allow duplicate subpattern names for name and email.

(?:(?:;|^)\h*(?<name>[^;<>]+)\h+<(?<email>[^<>\s@;]+@[^\s@<>;]+)>|(?<email>[^<>\s@;]+@[^\s@<>;]+))

Explanation

  • (?: Non capture group
    • (?:;|^)\h* Match either ; or start of the string and 1+ horizontal whitespace chars
    • (?<name> Named capture group name
      • [^;<>]+ Match 1+ times any char except the listed
    • )\h+ Close group and match 1+ horizontal whitespace chars
    • <(?<email> Match < and named capture group email
      • [^<>\s@;]+@[^\s@<>;]+ Match an email like format
    • )> Close group and match >
    • | Or
    • (?<email>[^<>\s@;]+@[^\s@<>;]+) Named capture group email
  • ) Close non capture group

Regex demo | Php demo

Example code

$re = '/(?:(?:;|^)\h*(?<name>[^;<>]+)\h+<(?<email>[^<>\s@;]+@[^\s@<>;]+)>|(?<email>[^<>\s@;]+@[^\s@<>;]+))/J';
$str = 'jeff <jeff@store.com>; john <john@bbaft.com>; joe@company.com; jj@abc.info; jamie <jam@sub.abc.com>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
    echo sprintf("Name: %s\nEmail: %s\n\n", $match["name"], $match["email"]);
}

Output

Name: jeff 
Email: jeff@store.com

Name: john 
Email: john@bbaft.com

Name: 
Email: joe@company.com

Name: 
Email: jj@abc.info

Name: jamie 
Email: jam@sub.abc.com
The fourth bird
  • 154,723
  • 16
  • 55
  • 70