3

I have a string The quick brown {fox, dragon, dinosaur} jumps over the lazy {dog, cat, bear, {lion, tiger}}.

I want to get all string that are in between on curly braces. Curly braces inside curly braces must be ignored. The expected output in PHP array would be

[0] => fox, dragon, dinosaur
[1] => dog, cat, bear, {lion, tiger}

I tried this pattern \{([\s\S]*)\} from Regex pattern extract string between curly braces and exclude curly braces answered by Mar but it seems this pattern get all string between curly braces without splitting non-related text (not sure the right word to use). Here is the output of the pattern above

fox, jumps, over} over the lazy {dog, cat, bear, {lion, tiger}}

What is the best regex pattern to print the expected output from the sentence above?

Community
  • 1
  • 1
valrecx
  • 459
  • 5
  • 19
  • possible duplicate of [What does the "\[^\]\[\]" regex mean?](http://stackoverflow.com/questions/17845014/what-does-the-regex-mean) – HamZa Apr 27 '15 at 13:33
  • There's a lot more duplicates but the answer provided to the linked duplicate is as far as I know the best since it provides an in depth explanation of the recursive technique. – HamZa Apr 27 '15 at 13:35
  • Another interesting answer [here](http://stackoverflow.com/a/14952740/1401975) – HamZa Apr 27 '15 at 13:40
  • Please guys help me with my question at http://stackoverflow.com/questions/33841196/how-to-match-text-inside-starting-and-closing-curly-brace-the-tags-and-the-spec – WebICT By Leo Nov 21 '15 at 08:17
  • Please attend to a similar question here http://stackoverflow.com/questions/33841196/how-to-match-text-inside-starting-and-closing-curly-brace-the-tags-and-the-spec – WebICT By Leo Nov 21 '15 at 08:44

2 Answers2

4

You can use this recursive regex pattern in PHP:

$re = '/( { ( (?: [^{}]* | (?1) )* ) } )/x'; 
$str = "The quick brown {fox, dragon, dinosaur} jumps over the lazy {dog, cat, bear, {lion, tiger}}."; 

preg_match_all($re, $str, $matches);
print_r($matches[2]);

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

As anubhava said, you can use a recursive pattern to do that.

However, his version is pretty "slow", and doesn't cover all cases.

I'd personnaly use this regex:

#({(?>[^{}]|(?0))*?})#

As you can see there: http://lumadis.be/regex/test_regex.php?id=2516 it is a -lot- faster; and matches more results.

So, how does it work?

/
  (              # capturing group
    {            # looks for the char '{'
    (?>          # atomic group, engine will never backtrack his choice
        [^{}]    #   looks for a non-'{}' char
      |          # or
        (?0)     #   re-run the regex in a subroutine to match a subgroup
    )*?          # and does it as many time as needed
    }            # looks for the char '}'
  )              # ends the capture
/x

Why did I use "*?"

Adding the '?' to '*' makes it non-greedy. If you use a greedy quantifier there, the engine will start way more subroutine than it would with an ungreedy's one. (If you need more explanation, let me know)

Tiller
  • 436
  • 1
  • 4
  • 22