3

I wrote a simple regex pattern with corresponding sample

   var regex = @"_if_.*_else_.*_endif_";

   // 4 nested regex pattern
   var sample = @"_if_111_else_222_if__if_333_else_444_endif__else_555_if_666_else_777_endif__endif__endif_";

   var matches = Regex.Matches(sample, regex); // count : 1 ?!?!?

Result of matched variables returns only 1 record while I expected it to return 4 records.

  • _if_666_else_777_endif_
  • _if_333_else_444_endif_
  • _if__if_333_else_444_endif__else_555_if_666_else_777_endif__endif__endif_
  • _if_111_else_222_if__if_333_else_444_endif__else_555_if_666_else_777_endif__endif__endif_

How can I get all patterns that exists in string by regex? Is there a better way?

  • In which form do you need the results? Do you just want every nested substring that matches the regular expression, or do you want to build some kind of tree? – zneak Jul 22 '16 at 06:05
  • @zneak , Exactly this is a nested if so if find correctly should be return 4 matches so I dont know how can I modify it for gain result – user6609534 Jul 22 '16 at 06:07
  • I mean, what result do you need? Just the number of sub-expressions? Every sub-expression string? Because just every string that matches doesn't tell you where it was to begin with, and that seriously limits your ability to do things. – zneak Jul 22 '16 at 06:09
  • I don't think that you can write that with a regular expression. For instance, in `_if__if_111_else_222_endif_else_333_endif_`, I don't know how you would find which `else` goes with the top-level `if` with just a regular expression. – zneak Jul 22 '16 at 06:41

1 Answers1

1

I suggest a 2 step approach combined into a Regex + Linq.

  • Get all the balanced substrings from _if_ till _endif_
  • Only keep those that have _else_ inside.

See IDEONE demo

var s = @"_if_111_else_222_if__if_333_else_444_endif__else_555_if_666_else_777_endif__endif__endif_";
var pat = @"(?x)(?=        # Start of the overlapping match capturing lookahead
         (_if_                 # Leading delimiter
          (?>                  # Start of atomic group (no backtracking into it)
           (?!_(?:end)?if_).   # Any symbol not starting the delimiter sequence 
           |(?<o>_if_)         # A leading delimiter added to stack o
           |(?<-o>_endif_)     # Trailing delimiter added to stack o
          )*                   # Repeat the atomic group 0+ times
          (?(o)(?!))           # If the o stack is not empty, fail the match
         _endif_               # Trailing delimiter
         )
        )";
var res = Regex.Matches(s, pat)
        .Cast<Match>()
        .Select(p => p.Groups[1].Value)
        .Where(n => n.Contains("_else_"))
        .ToList();
foreach (var v in res)
    Console.WriteLine(v);
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563