1

I've got a string which i want split up in specific segments but i cant match the correct segment of the string because of two occurences of the same pattern.

My string:

@if(text.text isempty){<customer_comment>@cc{txt_without_comments}cc@</customer_comment>}else{@if(text.answer=='no'){<customer_comment>@{text.text}</customer_comment>}else{<answer>@{text.text}</answer>}endif@}endif@

I need to match: @if(text.text isempty){@cc{txt_without_comments}cc@}else{....}endif@

and not the nested dots in the else-block.

Here is my incomplete regex:

(?<match>(?<open>@if\((?<statement>[^)]*)\)\s*{)(?<ifblock>(.+?)(?:}else{)(?<elseblock>.*))(?<-open>)}endif@)

This regex is too greedy in the ifblock group it supposed to stop at the first }else{ pattern.

Edit: This is the exact result i want to produce:

match: @if(text.text isempty){<customer_comment>@cc{txt_without_comments}cc@</customer_comment>}else{@if(text.answer=='no'){<customer_comment>@{text.text}</customer_comment>}else{<answer>@{text.text}</answer>}endif@}endif@

statement: text.text isempty

ifblock: <customer_comment>@cc{txt_without_comments}cc@</customer_comment>

elseblock: @if(text.answer=='no'){<customer_comment>@{text.text}</customer_comment>}else{<answer>@{text.text}</answer>}endif@
Magni Hansen
  • 82
  • 10

1 Answers1

1

You are not using balancing groups correctly. Balancing groups must be used to push some values into the stack using a capture and removed from the stack with other captures, and then a conditional construct is necessary to check if the group stack is empty, and if it is not, fail the match to enforce backtracking.

So, if the regex is the only way for you to match these strings, use the following:

(?s)(?<match>@if\((?<statement>[^)]*)\)\s*{\s*(?<ifblock>.*?)\s*}\s*else\s*{\s*(?<elseblock>@if\s*\((?:(?!@if\s*\(|\}\s*endif@).|(?<a>)@if\s*\(|(?<-a>)\}\s*endif@)*(?(a)(?!)))\}\s*endif@)

See the regex demo. However, writing a custom parser might turn out a better approach here.

Pattern details:

  • (?s) - single line mode on (. matches newline)
  • (?<match> - start of the outer group "match"
  • @if\( - a literal char sequence @if(
  • (?<statement>[^)]*) - Group "statement" capturing 0+ chars other than )
  • \)\s*{\s* - ), 0+ whitespaces, {, 0+ whitespaces
  • (?<ifblock>.*?) - Group "ifblock" that captures any 0+ chars, as few as possible up to the first...
  • \s*}\s*else\s*{\s* - 0+ whitespaces, }, 0+ whitespaces, else, 0+ whitespaces, {, 0+ whitespaces
  • (?<elseblock>@if\s*\((?:(?!@if\s*\(|\}\s*endif@).|(?<a>)@if\s*\(|(?<-a>)\}\s*endif@)*(?(a)(?!))) - Group "elseblock" capturing:
    • @if\s*\( - @if, 0+ whitespaces, (
    • (?: - start of the alternation group, that is repeated 0+ times
      • (?!@if\s*\(|\}\s*endif@).| - any char not starting the @if, 0+ whitespaces, ( sequence and not starting the }, 0+ whitespaces, endif@ sequence or...
      • (?<a>)@if\s*\(| - Group "a" pushing the @if, 0+ whitespaces and ( into stack
      • (?<-a>)\}\s*endif@ - }, 0+ whitespaces, endif@ removed from "a" group stack
    • )* - end of the alternation group
    • (?(a)(?!)) - conditional checking if the balanced amount of if and endif are matched
  • \}\s*endif@ - }, 0+ whitespaces, endif@
  • ) - end of the outer "match" group.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563