get several links inside specific div with one regex

Question

Take a look at this html:

<div class="foo"><a href="link1">link1</a><a href="link2">link2</a></div>
<div class="bar"><a href="barlink">barlink</a></div>

I would like to know if I can loop in all links inside foo with a regular expression within php. I tried this but isn't working:

preg_match_all(
  '#<div.*?class="foo".*?<a.*?>(?P<text>.*?)</a>#xi', 
  $text, 
  $matches, 
  PREG_SET_ORDER
);

sadly, in this case, it must be regex, not xml or other parsers.

What are you trying to do with `(?P.*?)`? Are you trying to get all link text within divs of class "foo"? — Andrew Cheong, May 09 '12 at 15:30

maiorano84 · Answer 1 · 2012-05-09T19:48:10.700

DON'T USE REGEX TO PARSE HTML.

<?php
$content = 
'<div class="foo">
<a href="link1">link1</a>
<a href="link2">link2</a>
</div>
<div class="bar">
<a href="barlink">barlink</a>
</div>';

$dom = new DOMDocument();
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div)
{
    $classes = explode(' ', $div->getAttribute('class'));
    if(in_array('foo', $classes) || trim($div->getAttribute('class')) === 'foo')
    {
        foreach($div->getElementsByTagName('a') as $link)
        {
            echo $dom->saveXML($link);
        }
    }
}
?>

This will output all matching links under any div with class 'foo'.

Regular Expressions should NOT be used to parse HTML, since HTML itself is not a regular language. It can get very sloppy and you can end up with more problems than what you started with, especially when you could potentially be dealing with malformed HTML.

@downvoters Sorry for providing an answer that stresses the importance of using [the correct tools for the correct jobs](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Maybe next time if you don't want to parse HTML with an HTML parser, you can drop everything and switch to [jQuery](http://meta.stackexchange.com/questions/19478/the-many-memes-of-meta#19492) — maiorano84, May 10 '12 at 18:06

get several links inside specific div with one regex

1 Answers1