0

Take a look at this html:

<div class="foo"><a href="link1">link1</a><a href="link2">link2</a></div>
<div class="bar"><a href="barlink">barlink</a></div>

I would like to know if I can loop in all links inside foo with a regular expression within php. I tried this but isn't working:

preg_match_all(
  '#<div.*?class="foo".*?<a.*?>(?P<text>.*?)</a>#xi', 
  $text, 
  $matches, 
  PREG_SET_ORDER
);

sadly, in this case, it must be regex, not xml or other parsers.

afuzzyllama
  • 6,538
  • 5
  • 47
  • 64
Jonathan
  • 4,724
  • 7
  • 45
  • 65

1 Answers1

-1

DON'T USE REGEX TO PARSE HTML.

<?php
$content = 
'<div class="foo">
<a href="link1">link1</a>
<a href="link2">link2</a>
</div>
<div class="bar">
<a href="barlink">barlink</a>
</div>';

$dom = new DOMDocument();
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div)
{
    $classes = explode(' ', $div->getAttribute('class'));
    if(in_array('foo', $classes) || trim($div->getAttribute('class')) === 'foo')
    {
        foreach($div->getElementsByTagName('a') as $link)
        {
            echo $dom->saveXML($link);
        }
    }
}
?>

This will output all matching links under any div with class 'foo'.

Regular Expressions should NOT be used to parse HTML, since HTML itself is not a regular language. It can get very sloppy and you can end up with more problems than what you started with, especially when you could potentially be dealing with malformed HTML.

maiorano84
  • 11,574
  • 3
  • 35
  • 48
  • 1
    @downvoters Sorry for providing an answer that stresses the importance of using [the correct tools for the correct jobs](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Maybe next time if you don't want to parse HTML with an HTML parser, you can drop everything and switch to [jQuery](http://meta.stackexchange.com/questions/19478/the-many-memes-of-meta#19492) – maiorano84 May 10 '12 at 18:06