2

I posted an answer for this question where the OP wants a regex to match different blocks of JSON-esque data with a condition that one of the properties has a specific value.

Simplifying the question a little bit - assume some sample data like this:

layer { foo { bar { baz } } qux }
layer { fee { bar { baz } } qux }
layer { foo { bar { baz foo } } qux { quux quuux } }
{}
zip { layer { zop { layer {yeehah { foo } } } } }
zip { layer{ zop { layer {yeehah { fee } } } } }

The regex should match for layer { .. stuff with nested data ...} but only where there is a data-element of foo.

My regex in the answer is:

layer\s*{(?>{(?<c>)|[^{}](?!fee)+|}(?<-c>))*(?(c)(?!))}

Which instead of positively identifying matches containing foo just excludes those containing fee. That is fine if all non-fee-items are foo-items but that wasn't the case for the question on the other thread. My solution basically adds all the other non-foo-items to the negative lookahead like this:

layer\s*{(?>{(?<c>)|[^{}](?!fee|blah|bloh|bluh|etc)+|}(?<-c>))*(?(c)(?!))}

But this is impractical if you do not know in advance the data items you want to exclude. I tried using a positive lookahead:

layer\s*{(?>{(?<c>)|[^{}](?=foo)+|}(?<-c>))*(?(c)(?!))}

But that does not work.

My question: can anyone help me re-write the regex to match for the e.g. layer { foo { bar } } items by using a positive lookahead - or do I need to use something different?

Community
  • 1
  • 1
Robin Mackenzie
  • 18,801
  • 7
  • 38
  • 56

1 Answers1

1

You do not need the positive lookahead, use capturing and stack with conditional check:

layer\s*{(?<f>\s*foo)?(?>{\s*foo(?<f>)(?<c>)|{(?<c>)|[^{}]+|}(?<-c>))*(?(c)(?!))(?(f)|(?!))}

See the regex demo

POIs:

  • layer\s*{(?<f>\s*foo)? - an optional named group "f" is added that captures foo if present after the layer { + optional number of whitespaces.
  • (?>{\s*foo(?<f>)(?<c>)| - the first branch inside the atomic group is a branch that matches a { (start of a node) that is followed with foo, and if matched, 2 stacks are incremented: f (foo group) and c (open brace group).
  • (?(f)|(?!)) - after the balanced number of { and } is checked, this conditional construct checks if the foo stack is not empty, and if it is not empty, all is OK, the match is returned, if not, the match is failed.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563