3

I have these code lines for take to operators between parentheses:

string filtered = Regex.Replace(input, "\\(.*?\\)", string.Empty);
var result = filtered.Split(new[] { ' ' }, 
            StringSplitOptions.RemoveEmptyEntries)
            .Where(element => element == "OR" || element == "AND");    
string temp = string.Join(" ", result);

These lines do not work for nested parentheses.

For example; it is working for this input :

X1 OR ( X2 AND X3 AND X4 AND X5 ) OR X6

It give me this result: OR OR

But, when my input has more than one nested parentheses, it works wrongly.

For this input:

X1 OR ( X2 AND( X3 AND X4 ) AND X5 ) OR X6

I want to take for result OR OR but it prints OR AND OR.

Although there are two ( characters in string, when it ends processing after matching the first ) character.

How can I adjust my regex pattern?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

1 Answers1

4

Your \(.*?\) regex contains 3 parts: 1) \( matching a literal (, 2) .*? lazy dot matching pattern (that matches 0+ any characters other than a newline, as few as possible, up to the first ), and 3) a \) matching a literal ).

Use balancing construct if your strings cannot have escaped sequences:

@"\((?>[^()]|(?<o>)\(|(?<-o>)\))*\)(?(o)(?!))"

The point here is that the expression should not be enclosed with any anchors (as in What are regular expression Balancing Groups).

Details:

  • \( - a literal (
  • (?> - start of an atomic group to prevent backtracking into it
    • [^()] - any char other than ( and )
    • | - or
    • (?<o>)\( - matches a literal ( and pushes an empty value into stack "o"
    • | - or
    • (?<-o>)\) - matches a literal ) and removes one value from stack "o"
  • )* - zero or more occurrences of the atomic group are matched
  • \) - a literal )
  • (?(o)(?!)) - a conditional construct failing the match if stack "o" contains values (is not empty).

See the regex demo.

var input = "X1 OR ( X2 AND( X3 AND X4 ) AND X5 ) OR X6";
var filtered = Regex.Replace(input, @"\((?>[^()]|(?<o>)\(|(?<-o>)\))*\)(?(o)(?!))", string.Empty);
var result = filtered.Split(new[] { ' ' }, 
    StringSplitOptions.RemoveEmptyEntries)
    .Where(element => element == "OR" || element == "AND");    
var temp = string.Join(" ", result);

See the C# demo

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563