Given this test data:
$text = '
{% a %}
{% b %}
{% a %}
{% end %}
{% end %}
{% b %}
{% end %}
{% end %}
{% c %}
{% end %}
';
This tested script does the trick:
<?php
$re = '/
# Match nested {% a %}{% b %}...{% end %}{% end %} structures.
\{%[ ]\w[ ]%\} # Opening delimiter.
(?: # Group for contents alternatives.
(?R) # Either a nested recursive component,
| # or non-recursive component stuff.
[^{]*+ # {normal*} Zero or more non-{
(?: # Begin: "unrolling-the-loop"
\{ # {special} Allow a { as long
(?! # as it is not the start of
%[ ]\w[ ]%\} # a new nested component, or
| %[ ]end[ ]%\} # the end of this component.
) # Ok to match { followed by
[^{]*+ # more {normal*}. (See: MRE3!)
)*+ # End {(special normal*)*} construct.
)*+ # Zero or more contents alternatives
\{%[ ]end[ ]%\} # Closing delimiter.
/ix';
$count = preg_match_all($re, $text, $m);
if ($count) {
printf("%d Matches:\n", $count);
for ($i = 0; $i < $count; ++$i) {
printf("\nMatch %d:\n%s\n", $i + 1, $m[0][$i]);
}
}
?>
Here is the output:
2 Matches:
Match 1:
{% a %}
{% b %}
{% a %}
{% end %}
{% end %}
{% b %}
{% end %}
{% end %}
Match 2:
{% c %}
{% end %}
Edit: If you need to match an opening tag having more than one word char, replace the two occurrences of the \w
tokens with (?!end)\w++
, (as is correctly implemented in tchrist's excellent answer).