I try to parse html source code. I have a nested regular expression
(\˂[0-9a-z\s#-_="]\˃((?>[^\˂\˃]+)|(?R))(\˂/[a-z]\˃)?\s)* inspired from here:
My problem is that I only get 2 levels (the div and table tags). Is there something wrong in my RegEx?
<pre>
<?php
$pattern = '/(\˂[0-9a-z\s#-_="]*\˃((?>[^\˂\˃]+)|(?R))*(\˂\/[a-z]*\˃)?\s*)*/mx';
$subject = <<<EOT
˂div class="post"˃
˂table˃
˂tbody˃
˂tr height="12"˃
˂td˃˂/td˃
˂td width="20" class="strip" rowspan="5"˃
˂div class="follow unpublish"˃☆˂/div˃
˂div class="follow report"˃⚐˂/div˃
˂/td˃
˂/tr˃
˂/tbody˃
˂/table˃
˂/div˃
EOT;
preg_match($pattern, $subject, $matches);
print_r($matches);
?>
</pre>