3

How to use php to get inside nested braces?

example:

{{ text1 {{text2 text3 {{text4}} text5}} }}

should output

1- text1 {{text2 text3 {{text4}} text5}}
2- text2 text3 {{text4}} text5
3- text4
HabibS
  • 382
  • 6
  • 14

3 Answers3

5

This requires keeping track of the number of brackets and cannot be done using regex. You will have to create your own parser logic for this. Regex is not a parser, sorry.

Here is another similar question with the same response as mine

And here is a SO about building parses (in Java, but it should translate well enough)

Community
  • 1
  • 1
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • Yes it can be done using regex - you just aren't trying hard enough! – ridgerunner Jul 02 '12 at 14:40
  • @ridgerunner True regular expressions cannot do this. Maybe some engines can implement logic, but the base regex cannot. Until all engines can handle this, then I stand by my answer. Also, you do have to begin to weigh the simplicity and readability... – Justin Pihony Jul 02 '12 at 14:47
  • When you say "TRUE" regular expressions, which specific engine are you talking about? Ruby, Java, JavaScript, Perl, PHP, C#, Python or even grep? Regular expressions have not been REGULAR for a long time. See tchrist's excellent response to a similar claim: [Do NOT try parsing with regular expressions](http://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html#comment_40). – ridgerunner Jul 02 '12 at 15:27
  • sorry, i don't understand, why people vote up this question. can anybody explain this to me? – gaussblurinc Jul 07 '12 at 09:18
2

PCRE, like Perl can match nested structures to any arbitrary depth (limited only by memory - see below). Here is a tested script:

Regex to match nested brackets

<?php // test.php Rev:20120702_1100

$re_nested_double_bracket ='% # Rev:20120702_1100
    # Match {{...{{...}}...}} structure with arbitrary nesting.
    \{\{                      # Opening literal double bracket.
    (                         # $1: Contents of double brackets.
      (?:                     # Group for contents alternatives.
        [^{}]++               # Either one or more non-brackets,
      | (?R)                  # or a nested bracket pair,
      | \{                    # or the start of opening bracket
        (?!\{)                # (if not a complete open bracket),
      | \}                    # or the start of closing bracket.
        (?!\})                # (if not a complete close bracket).
      )*                      # Zero or more contents alternatives.
    )                         # End $1: Contents of double brackets.
    \}\}                      # Closing literal double bracket.
    %x';

$results = array(); // Global array to receive results.

// Recursively called callback routine adds to $results array.
function _bracket_contents_callback($matches) {
    global $results, $re_nested_double_bracket;
    $results[] = $matches[1];
    preg_replace_callback($re_nested_double_bracket,
        '_bracket_contents_callback', $matches[1]);
    return $matches[0]; // Don't modify string.
}

$input = file_get_contents('testdata.txt');
preg_replace_callback($re_nested_double_bracket,
    '_bracket_contents_callback', $input);

$count = count($results);
printf("There were %d matches found.\n", $count);
for ($i = 0; $i < $count; ++$i) {
    printf("  Match[%d]: %s\n", $i + 1, $results[$i]);
}
?>

When run against the test data in the original post, here is what the regex matches:

Example Output:

There were 3 matches found.
Match[1]: text1 {{text2 text3 {{text4}} text5}}
Match[2]: text2 text3 {{text4}} text5
Match[3]: text4

Note that this regex matches the outermost set of possibly nested brackets and captures into group $1 the contents between the brackets. The script makes use of the preg_replace_callback() function to recursively match and add nested bracket contents to the results array.

"Arbitrary depth" Note that this solution matches nested brackets to any "arbitrary depth", but is always limited by system memory, executable stack size and the PHP pcre.backtrack_limit, pcre.recursion_limit and memory_limit configuration variables. Note that it is certainly possible for this regex solution to fail if the subject string is too large and/or the nesting too deep for a given host system. It is even possible for the PHP/PCRE library to cause the running executable to generate a stack overflow, segmentation-fault and program crash! See my answer to a related question for an in-depth discussion on how and why this can occur (and how to avoid it and gracefully handle errors of this sort): RegExp in preg_match function returning browser error and PHP regex: is there anything wrong with this code?.

Note: This question (and my answer) are almost the same as: Parsing proprietary tag syntax with regex - how to detect nested tags?, but in this answer, a fuller solution is presented which recursively matches and stores all nested bracket contents.

Community
  • 1
  • 1
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
0

I've found the answer i was looking for and put this here so everyone could use it. Its very simple indeed, in one line only:

  $text1=preg_replace("/\{\{(([^{}]*|(?R))*)\}\}/",'',$text1);

It will search and replace all {{text}} with whatever you want. You can also use the preg_match_all to get all of them in an array.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
HabibS
  • 382
  • 6
  • 14