4

I'm working on a c# regular expression that can match nested constructions (parentheses in this case) as well as arbitrary operators (a '|' character in this case).

I've gotten started by using a push-down automata as described here.

What I have so far:

String pattern = @"
(?# line 01) \(
(?# line 02) (?>
(?# line 03) \( (?<DEPTH>)
(?# line 04) |
(?# line 05) \) (?<-DEPTH>)
(?# line 06) |
(?# line 07) .?
(?# line 08) )*
(?# line 09) (?(DEPTH)(?!))
(?# line 10) \)
";

var source = "((Name1| Name2) Blah) | (Name3 ( Blah | Blah))";

var matches = Regex.Matches(source, pattern,
  RegexOptions.IgnorePatternWhitespace);
matches.Dump();

Yields the following results:

// ((Name1| Name2) Blah)
// (Name3 ( Blah | Blah))

Desired results:

// ((Name1| Name2) Blah)
// |
// (Name3 ( Blah | Blah))

Note: There may or may not be any operators between the groups. For example, the source may look like "((Name1| Name2) Blah) (Name3 ( Blah | Blah))"

pb2q
  • 58,613
  • 19
  • 146
  • 147
Tim Capps
  • 76
  • 7
  • 1
    Regex is not a good candidate for this. I would suggest parsing it yourself or using a parsing library. That is, assuming your nested parenthesis structure can be more complex than what you've given. – Simon Whitehead Jun 19 '13 at 22:22
  • @SimonWhitehead Yes, I'm aware that a parser is a better way to go for maintainability. I plan to use Antlr for a more permanant solution. Thank you for your input! – Tim Capps Jun 20 '13 at 18:21

1 Answers1

3

You can try this: (just adding |\| at the end)

\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!))\)|\|
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125