1

I need to match a string within balanced parentheses before a literal period in c#. My regex with balanced groups works except when there are extra open parens in the string. According to my understanding, this requires a conditional fail pattern to ensure the stack is empty on match, yet something is not quite right.

Original regex:

@"(?<Par>[(]).+(?<-Par>[)])\."

With fail-pattern:

@"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\."

Test-code (last 2 fail):

string[] tests = {
    "a.c",   "",
    "a).c",  "",
    "(a.c",  "",
    "a(a).c", "(a).",
    "a(a b).c", "(a b).",
    "a((a b)).c", "((a b)).",
    "a(((a b))).c", "(((a b))).",
    "a((a) (b)).c", "((a) (b)).",
    "a((a)(b)).c", "((a)(b)).",
    "a((ab)).c", "((ab)).",
    "a)((ab)).(c", "((ab)).",
    "a(((a b)).c", "((a b)).", 
    "a(((a b)).)c", "((a b))."
};

Regex re = new Regex(@"(?<Par>[(]).+(?<-Par>[)])(?(Par)(?!))\.");

for (int i = 0; i < tests.Length; i += 2)
{
    var result = re.Match(tests[i]).Groups[0].Value;
    if (result != tests[i + 1]) throw new Exception
        ("Expecting: " + tests[i + 1] + ", got " + result);
}
rednoyz
  • 1,318
  • 10
  • 24
  • 1
    https://www.xkcd.com/1171/ – Theraot May 28 '18 at 06:13
  • Just in case this is a XY Problem (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) can you share _why_ you are trying to do this? – mjwills May 28 '18 at 06:23
  • 1
    In my opinion do notnuse Regex for Parenthesis Balance problem. Go the crude Stack way. – Prateek Shrivastava May 28 '18 at 06:27
  • You seem to love regex horrors, considering this is the second question about regexes and balanced parentheses – xanatos May 28 '18 at 06:40
  • I think it is a classical case of using a balanced construct with a `.` after it, so [`\((?>[^()]+|(?)\(|(?<-o>)\))*(?(o)(?!))\)\.`](http://regexstorm.net/tester?p=%5c%28%28%3f%3e%5b%5e%28%29%5cn%5d%2b%7c%28%3f%3co%3e%29%5c%28%7c%28%3f%3c-o%3e%29%5c%29%29*%28%3f%28o%29%28%3f!%29%29%5c%29%5c.&i=a%28a+b%29.c%22%2c+%22%28a+b%29.%22%2c%0d%0a) should work. – Wiktor Stribiżew May 28 '18 at 07:05
  • nice - add it as an answer so I can accept it – rednoyz May 28 '18 at 07:25
  • @rednoyz Ok, posted. Looks like `\((?>[^()]+|(?\()|(?<-o>\)))*(?(o)(?!))\)\.` is the faster version. – Wiktor Stribiżew May 28 '18 at 07:52

1 Answers1

1

You may use a well-known regex to match balanced parentheses and just append a \. to it:

\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!))\)\.
|---------- balanced parens part ----------|.|

See the regex demo.

Details

  • \( - a (
  • (?> - start of an atomic group
    • [^()]+ - 1 or more chars other than ( and )
    • | - or
    • (?<o>)\( - an opening ( is pushed on to the Group o stack
    • | - or
    • (?<-o>)\) - a closing ( is popped off the Group o stack
  • )* - 0 or more repetitions of the atomic group
  • (?(o)(?!)) - fail the match if Group o stack is not empty
  • \) - a )
  • \. - a dot.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Just tested at RegexHero.net, and it appears that the pattern will work faster if you include `\(` and `\)` into Group "o": `\((?>[^()]+|(?\()|(?<-o>\)))*(?(o)(?!))\)\.` – Wiktor Stribiżew May 28 '18 at 07:52