3

I know a bit about regular expressions, but far from enough to figure out this one.

I have tried to see if I could find something that could help me, but I got a hard time understanding how to construct the REGEX expression in c#.

Here is what I need.If I have a string like the following.

string s = "this is (a (string))"

What I need is to focus on the parentheses.

I want to be able to split this string up into the following List/Array "parts".

1) "this", "is", "a (string)"

or

2) "this", "is", "(a (string))".

would both like how to do it with 1) and 2). Anyone got an idea of how to solve this problem?

Can this be solved using REGEX? Anyone knows a good guide to learn about it?

Hope someone can help.

Greetings.

ANACoder
  • 63
  • 4
  • If you say you know a bit of a regex, what have you tried? And please be precise with what you need to obtain as a result. – Wiktor Stribiżew Apr 11 '16 at 13:25
  • 2
    Regular expressions don't work well with nested constructs like this. – juharr Apr 11 '16 at 13:27
  • I don't know much about regex. I know about regular expressions a bit in general. – ANACoder Apr 11 '16 at 13:27
  • "Regular expressions don't work well with nested constructs like this." Okay? Any suggest of what to look at then? – ANACoder Apr 11 '16 at 13:29
  • @ANACoder You'll have to parse the string yourself. Loop through the characters and keep track of the parenthesis depth. But it really depends on which of the two outcomes you want and maybe on what should happen if there is more nesting than what you have shown. – juharr Apr 11 '16 at 13:32
  • If you are looking for a regex solution, here it is: [`\(((?>[^()]+|\((?)|\)(?<-o>))*(?(o)(?!)))\)|\S+`](http://regexstorm.net/tester?p=%5c(((%3f%3e%5b%5e()%5d%2b%7c%5c((%3f%3co%3e)%7c%5c)(%3f%3c-o%3e))*(%3f(o)(%3f!)))%5c)%7c%5cS%2b&i=this+is+(a+(string))). – Wiktor Stribiżew Apr 11 '16 at 14:01
  • See [this demo](http://ideone.com/tln9mG). – Wiktor Stribiżew Apr 11 '16 at 14:07

4 Answers4

4

If you want to split with some kind of escape (do not count for space if it's within parentheses) you can easily implement something like this, easy loop without regular expressions:

private static IEnumerable<String> SplitWithEscape(String source) {
  if (String.IsNullOrEmpty(source))
    yield break;

  int escapeCount = 0;
  int start = 0;

  for (int i = 0; i < source.Length; ++i) {
    char ch = source[i];

    if (escapeCount > 0) {
      if (ch == '(')
        escapeCount += 1;
      else if (ch == ')')
        escapeCount -= 1;
    }
    else {
      if (ch == ' ') {
        yield return source.Substring(start, i - start);

        start = i;
      }
      else if (ch == '(')
        escapeCount += 1;
    }
  }

  if ((start < source.Length - 1) && (escapeCount == 0))
    yield return source.Substring(start);
}


....

String source = "this is (a (string))";
String[] split = SplitWithEscape(source).ToArray();
Console.Write(String.Join("; ", split));
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
1

You can try something like this:

([^\(\s]+)\s+([^\(\s]+)\s+\((.*)\)

Regex Demo

But this will only match with fixed number of words in your input string, in this case, two words before the parentheses. The final regex will depend on what are your specifications.

Lucas Araujo
  • 1,648
  • 16
  • 25
1

.NET regex supports balanced constructs. Thus, you can always safely use .NET regex to match substrings between a balanced number of delimiters that may have something inside them.

So, you can use

\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+

to match parenthesized substrings (while capturing the contents in-between parentheses into Group 1) or match all non-whitespace chunks (\S+ matches 1+ non-whitespace symbols).

See Grouping Constructs in Regular Expressions, Matching Nested Constructs with Balancing Groups or What are regular expression Balancing Groups? for more details on how balancing groups work.

Here is a regex demo

If you need to extract all the match values and captured values, you need to get all matched groups that are not empty or whitespace. So, use this C# code:

var line = "this is (a (string))";
var pattern = @"\(((?>[^()]+|\((?<o>)|\)(?<-o>))*(?(o)(?!)))\)|\S+";
var result = Regex.Matches(line, pattern)
        .Cast<Match>()
        .SelectMany(x => x.Groups.Cast<Group>()
            .Where(m => !string.IsNullOrWhiteSpace(m.Value))
            .Select(t => t.Value))
        .ToList();
foreach (var s in result) // DEMO
    Console.WriteLine(s);
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Maybe you can use ((?<=\()[^}]*(?=\)))|\W+ to split in words and then get the content in the group 1...
See this Regex

Adam Calvet Bohl
  • 1,009
  • 14
  • 29