-2

I have the following format for emails on a page <a href='mailto:xyz@xyz.com'> that I am trying to match with a regex

I am currently using this:

$pattern = '#a[^>]+href="mailto:([^"]+)"[^>]*?>#is';
preg_match_all($pattern, $data, $matches);
foreach ($matches[1] as $key => $email) {
    $emails[] = $email;
}

but it results in no match. $emails is NULL.

I am just learning regular expressions so please forgive the question! Can someone explain why it doesn't work and suggest a change? Thanks

algorithmicCoder
  • 6,595
  • 20
  • 68
  • 117
  • Well, when you **learn** something it is good when you write something **yourself**, from the scratch, not just modify the code you've found somewhere. Let's start solving the issue together, step by step, from the scratch (in case that you really want to *learn* something) – zerkms Nov 11 '11 at 14:34
  • Please change the title of your question to be more meaningful to what you want to achieve before I change my mind and downvote just because of it. – Romain Nov 11 '11 at 14:35
  • 1
    a single quote is not the same as a double quote. – thetaiko Nov 11 '11 at 14:35
  • 1
    The HTML you posted has an `href` with single quotes whereas the regular expression uses double quotes. *(Insert obligatory rant about [parsing HTML with regex here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags))* – Linus Kleen Nov 11 '11 at 14:35
  • Use DOM and this XPath: `//a/@href[starts-with(., 'mailto:')]`. – Gordon Nov 11 '11 at 14:43

3 Answers3

1

The problem is that you in your example use single qoutes ' whereas in the regex it's looking for double quotes ".

Changing your pattern to:

$pattern = '#a[^>]+href=\'mailto:([^\']+)\'[^>]*?>#is';

Would do the trick.

Marcus
  • 12,296
  • 5
  • 48
  • 66
1

Just add the support for double an single quotes as @Linus-Kleen and @thetaiko said:

$pattern = '#a[^>]+href=[\'"]mailto:([^\'"]+)[\'"][^>]*?>#is';
SERPRO
  • 10,015
  • 8
  • 46
  • 63
0

The regex works, but you don't check for both ' (single quote) and " (double quote) so that

<a href='mailto:xyz@xyz.com'>

Will not get matched, but

<a href="mailto:xyz@xyz.com">

will.

Simply changing the regex to

'#a[^>]+href=[\'"]+mailto:([^"]+)[\'"]+[^>]*?>#is'

will do the trick!

Willem Mulder
  • 12,974
  • 3
  • 37
  • 62