2

I have a string of the following form

$string = "This is {test} for [a]{test2} for {test3}.";

I want to get all curly brackets that are not prefixed by square brackets. Thus, in the above string I would like to get {test} and {test3} but not [a]{test2}.

I found in the answer https://stackoverflow.com/a/977294/2311074 that this might be possible with negative lookahead. So I tried

  $regex      = '/(?:(?!\[[^\}]+\])\{[^\}]+\})/';
  echo preg_match_all($regex, $string, $matches) . '<br>';
  print_r($matches);

but this still gives me all three curly brackets.

3

Array ( [0] => Array ( [0] => {test} [1] => {test2} [2] => {test3} ) )

Why is this not working?

Community
  • 1
  • 1
Adam
  • 25,960
  • 22
  • 158
  • 247
  • @WiktorStribiżew thanks for your detailed answer. I am catching up on the topic negative lookahead. I will respond / upvote your answer as soon as I have understood it. – Adam Apr 26 '17 at 19:09
  • Please ask what is unclear right away - I will be online for some hours. – Wiktor Stribiżew Apr 26 '17 at 19:09

2 Answers2

2

If you are sure opening curly braces would only be preceded with a pair of square brackets (balanced) then a negative lookbehind will do the job:

(?<!]){[^}]*}

Live demo

Graham
  • 7,431
  • 18
  • 59
  • 84
revo
  • 47,783
  • 14
  • 74
  • 117
1

The reason your regex fails is that it matches any { (followed with 1+ non-}s and then a }) if it does not start a sequence of the patterns inside the negative lookahead, a [, 1+ chars other than } and then a ] (and it is always true, so, you get all {...} substrings as a result).

Use (*SKIP)(*FAIL) technique:

\[[^]]*]\{[^}]+}(*SKIP)(*F)|\{[^\}]+}

See the regex demo.

Details:

  • \[[^]]*]\{[^}]+}(*SKIP)(*F) - matches
    • \[ - a [
    • [^]]* - 0+ chars other than ]
    • ]\{ - ]{ substring
    • [^}]+ - 1+ chars other than ]
    • } - a literal }
    • (*SKIP)(*F) - PCRE verbs discarding the text matched so far and forcing the engine to go on looking for the next match from the current position (as if a match occurred)
  • | - or
  • \{[^\}]+}:
    • \{ - a {
    • [^\}]+ - 1+ chars other than } and
    • } - a literal }.

See the PHP demo:

$string = "This is {test} for [a]{test2} for {test3}.";
$regex      = '/\[[^]]*]\{[^}]+}(*SKIP)(*F)|\{[^}]+}/';
echo preg_match_all($regex, $string, $matches) . "\n";
print_r($matches[0]);

Output:

2
Array
(
    [0] => {test}
    [1] => {test3}
)
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you. I just realized that I had a mistake in my rexgex, I actually wanted to use `'/(?<!\[[^\}]+\])\{[^\}]+\}/'` but this doesn't work because of **“lookbehind assertion MUST be fixed length”** http://stackoverflow.com/questions/3796436/whats-the-technical-reason-for-lookbehind-assertion-must-be-fixed-length-in-r. So here it makes sense to use your skip fail method, instead of going backwards. Thank you! – Adam Apr 26 '17 at 21:22
  • Glad I could help. `(*SKIP)(*FAIL)` is the only correct way to negate something in PCRE having no access to a negative infinite width lookbehind without making assumptions. – Wiktor Stribiżew Apr 26 '17 at 21:39