1

i could match content between tr tags with this regex:

<tr\s+class='test'>((?!</tr>).)*</tr>

but if i put the star quantifiers inside the parenthesis right next to the dot metacharacters,they match only the whole pattern with capturing group empty.

$string = "<tr class='test'>
<td>test1</td>
</tr>
 <div class='ignored' >text text</div>
 <tr class='test'>
 <td>test2</td>
 </tr>";


preg_match_all("|<tr\s+class='test'>((?!</tr>).*)</tr>|si",$string,$matches);

print_r($matches);

i know what lookaround is but i'm not quite sure what exactly cause the difference. hope someone can shed some light on this. Thank you!

JIA
  • 173
  • 4
  • 11
  • 1
    Oh, parsing HTML with regex? It's been some time since I saw a question in that aspect :-) Hopefully you are aware of that: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Good luck by the way with finding the *right expression*. – Darin Dimitrov Dec 28 '13 at 16:44
  • Thanks for the link. i tried simple_html_dom but was unable to use it solve my problem. what alternatives do you recommend if i want to extract contents from tr tags with a certian class name? Anyway, i'm learning regex and want to understand how it works. – JIA Dec 28 '13 at 16:53
  • Sorry I am not a PHP developer and cannot offer good frameworks for HTML parsing. What I can say for sure is that if you want to learn regex, parsing HTML is the WORST possible example you might pick to learn from. – Darin Dimitrov Dec 28 '13 at 16:57

1 Answers1

1
((?!</tr>).)*

The repetition is applied to ((?!</tr>).) and there is a single . and a single lookahead. Therefore, this will check each and every . (at each repetition) and make sure they are not followed by </tr>.

((?!</tr>).*)

This is actually (?!</tr>).* in disguise. There is a single lookahead and a single .*. The lookahead will check only the first ., but not the others, which is why everything will be matched, unless the immediate dots after the lookahead matches </tr>.

Jerry
  • 70,495
  • 13
  • 100
  • 144