2

In PHP I have string with nested brackets:

bar[foo[test[abc][def]]bar]foo

I need a regex that matches the inner bracket-pairs first, so the order in which preg_match_all finds the matching bracket-pairs should be:

[abc]
[def]
[test[abc][def]]
[foo[test[abc][def]]bar]

All texts may vary.

Is this even possible with preg_match_all ?

Dylan
  • 9,129
  • 20
  • 96
  • 153
  • possible duplicate of [Nested Parentheses to Array using regex in PHP](http://stackoverflow.com/questions/10361562/nested-parentheses-to-array-using-regex-in-php) – Marc B Feb 14 '13 at 15:48
  • Can there ever be more than one pair inside another pair? e.g. `[foo[bar][baz][xyzzy]lol]`? – leftclickben Feb 14 '13 at 15:49
  • yeah, ideally this could be the case. I changed the question to include this functionality... – Dylan Feb 14 '13 at 15:50
  • @MarcB The suggested duplicate wants to ignore the inner groups, which is the opposite of this question. – Barmar Feb 14 '13 at 15:54

2 Answers2

2

This is not possible with regular expressions. No matter how complex your regex, it will always return the left-most match first.

At best, you'd have to use multiple regexes, but even then you're going to have trouble because regexes can't really count matching brackets. Your best bet is to parse this string some other way.

Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
0

Is not evident in your question what kind of "structure of matches" you whant... But you can use only simple arrays. Try

  preg_match_all('#\[([a-z\)\(]+?)\]#',$original,$m); 

that, for $original = 'bar[foo[test[abc][def]]bar]foo' returns an array with "abc" and "def", the inner ones.


For your output, you need a loop for the "parsing task". PCRE with preg_replace_callback is better for parsing.

Perhaps this loop is a good clue for your problem,

 $original = 'bar[foo[test[abc][def]]bar]foo';

 for( $aux=$oldAux=$original; 
      $oldAux!=($aux=printInnerBracket($aux)); 
      $oldAux=$aux
 );
 print "\n-- $aux";

 function printInnerBracket($s) {
    return preg_replace_callback(
            '#\[([a-z\)\(]+?)\]#',  // the only one regular expression
            function($m) {
               print "\n$m[0]"; 
               return "($m[1])";
            },
            $s
    );
 }

Result (the callback print):

[abc]
[def]
[test(abc)(def)]
[foo(test(abc)(def))bar]
-- bar(foo(test(abc)(def))bar)foo

See also this related question.

Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304