3

I want to match all parentheses including the inner and outer parentheses.

Input: abc(test)def(rst(another test)uv)xy

Desired Output: (test)

(rst(another test)uv)

(another test)

My following c# code returns only (test) and (rst(another test)uv):

string input = "abc(test)def(rst(another test)uv)xy";

Regex regex = new Regex(@"\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)", RegexOptions.IgnorePatternWhitespace);

foreach (Match c in regex.Matches(input))
{
    Console.WriteLine(c.Value);
}
nam
  • 21,967
  • 37
  • 158
  • 332

3 Answers3

1

You are looking for overlapping matches. Thus, just place your regex into a capturing group and put it inside a non-anchored positive lookahead:

Regex regex = new Regex(@"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);

The value you need will be inside match.Groups[1].Value.

See the IDEONE demo:

using System;
using System.Text.RegularExpressions;
using System.IO;
using System.Linq;
public class Test
{
    public static void Main()
    {
        var input = "abc(test)def(rst(another test)uv)xy";
        var regex = new Regex(@"(?=(\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)))", RegexOptions.IgnorePatternWhitespace);
        var results = regex.Matches(input).Cast<Match>()
                       .Select(p => p.Groups[1].Value)
                       .ToList();
        Console.WriteLine(String.Join(", ", results));
    }
}

Results: (test), (rst(another test)uv), (another test).

Note that unanchored positive look-aheads can be used to find overlapping matches with capturing in place because they do not consume text (i.e. the regex engine index stays at its current position when trying to match with all the subpatterns inside the lookahead) and the regex engine automatically moves its index after match/failure making the matching process "global" (i.e. tests for a match at every position inside an input string).

Although lookahead subexpressions do not match, they still can capture into groups.

Thus, when the look-ahead comes to the (, it may match a zero-width string and place they value you need into Group 1. Then, it goes on and finds another ( inside the first (...), and can capture a substring inside it again.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • It works. If I still have questions I'll post a comment. Thank you for briefly explaining your solution as it helped me understand better for future usage. – nam Nov 15 '15 at 17:20
  • I posted my answer using a mobile phone, and could only fix the typos now. I added explanations and a demo code. Glad it works for you. – Wiktor Stribiżew Nov 15 '15 at 22:06
  • Your suggested pattern is not working for Regex.Replace(...) method. I may be missing something or it may have something to do with the regex engine index that you mentioned in your post. For example, `input = Regex.Replace(st, @"(?=(\(([^()]+|(?\()|(?<-Level>\)))+(?(Level)(?!))\)))", delegate(Match match) { return '[' + match.Groups[1].Value + ']'; });` gives an odd output as: `abc[(test)](test)def[(rst(another test)uv)](rst[(another test)](another test)uv)xy` instead of `abc[(test)]def[(rst[(another test)]uv)]xy` – nam Nov 16 '15 at 03:06
  • I think you misunderstand how replace works with empty string matches. As it does not consume characters you just replace an empty string before the captured texts with the replacement. Add more code: first, get the strings you need to replace, then replace them manually within a foreach loop. – Wiktor Stribiżew Nov 16 '15 at 07:22
  • Did you finally make it? If not, please let me know. – Wiktor Stribiżew Nov 18 '15 at 19:27
  • But the MatchEvaluator delegate (that I'm using) does the same thing: It gets each matched string and replaces it with the replacement string. I've not been able to make it. – nam Nov 21 '15 at 16:46
  • Ok, I'll add code to show you. Just FYI: **Match**Evaluator won't work because it needs a match value, but the `match.Value` is empty in this case. – Wiktor Stribiżew Nov 21 '15 at 16:48
  • Your solution for replace works - thanks. Going back to using MatchEvaluator, In delegate, I'm using match.Groups[1].Value (not match.value) that is not an empty string, but it (the delegate approach) does not work. – nam Nov 21 '15 at 17:48
  • @nam: I have already explained that the text to replace is the *match.Value*, not the groups[n], so you are replacing an empty string with a captured subtext. Which means **you cannot use MatchEvaluator to replace the overlapping texts that are only *captured*, but not *matched***. – Wiktor Stribiżew Nov 22 '15 at 11:58
0

Edit: This answer is flat out wrong for .Net regular expressions - see nam's comment below.

Original answer:

Regular expressions match regular languages. Nested parentheses are not a regular language, they require a context-free grammar to match. So the short answer is there is no way to do what you're attempting.

https://stackoverflow.com/a/133684/361631

Community
  • 1
  • 1
PaulF
  • 1,133
  • 8
  • 14
  • I'm using .NET Regex that does support [Balancing Groups](https://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition). – nam Nov 15 '15 at 02:07
0

You could use this one : \((?>[^()]+|\((?<P>)|(?<C-P>)\))*(?(P)(?!))\) but you'll have to dig through captures, groups and groups' captures to get what you want (see demo)

Sehnsucht
  • 5,019
  • 17
  • 27