2

I have such text:

((#) This is text

    ((#) This is subtext 

        ((#) This is sub-subtext #)

    #)

 #)

I made following regex:

        var counter = 0;
        return Regex.Replace(text,
             @"\(\(#\)(.*?)#\)",
             m =>
             {
                var str = m.ToString();
                counter++;
                return counter + ") " + str.Replace("((#)", "").Replace("#)", "")
             });

So the result I expected would be like

1) This is text
   2) This is subtext
       3) This is sub-subtext

I know that this will not work properly, because regex will take #) from the second ((#) and so on.

How to avoid this conflict? Thanks! :)

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
podeig
  • 2,597
  • 8
  • 36
  • 60
  • If you change the regex to `@"\(\(#\)(.*)"` you will partly get the output you need, it will still have `#)`s. Are you looking to obtain nested substrings? – Wiktor Stribiżew Dec 16 '15 at 09:37
  • Yes, it must be nested substrings. – podeig Dec 16 '15 at 09:39
  • Possible duplicate of [Can regular expressions be used to match nested patterns?](http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns) – Ondrej Svejdar Dec 16 '15 at 09:39
  • @OndrejSvejdar: No, it is not since the accepted answer is not appropriate for .NET. Podeig, the problem here is that you cannot do it within one single operation. 1) Get the nested strings, 2) replace in a loop. – Wiktor Stribiżew Dec 16 '15 at 09:39
  • @stribizhev Could you provide an example? – podeig Dec 16 '15 at 09:46

2 Answers2

1

Here is the solution I suggest:

  • Get the nested strings with the regex featuring balanced groups,
  • Replace the substrings in a loop.

See the regex demo here. It matches empty strings but also captures all nested substrings that start with ((#) and end with #).

Here is C# demo code:

var text = @"((#) This is text

    ((#) This is subtext 

        ((#) This is sub-subtext #)

     #)

#)";
var chunks = Regex.Matches(text,
            @"(?s)(?=(\(\(#\)(?>(?!\(\(#\)|#\)).|\(\(#\)(?<D>)|#\)(?<-D>))*(?(D)(?!))#\)))")
               .Cast<Match>().Select(p => p.Groups[1].Value)
               .ToList();
for (var i = 0; i < chunks.Count; i++)
     text = text.Replace(chunks[i], string.Format("{0}) {1}", (i+1), 
                         chunks[i].Substring(4, chunks[i].Length-6).Trim()));

Note that .Substring(4, chunks[i].Length-6) just gets a substring from ((#) up to #). Since we know the delimiters, we can hardcode these values.

Output:

enter image description here

To learn more about balancing groups, see Balancing Groups Definition and Fun With .NET Regex Balancing Groups.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you! You brought me much closer to the final solution! I need to read more about balancing group definitions :) – podeig Dec 16 '15 at 11:36
0

I believe this to be impossible, because your grammar is at its core recursive:

TEXT := "((#)" TEXT "#)"

Which is something that cannot be consumed by a regular expression, because it can only handle languages created by regular grammar.

In that sense, the question linked by Ondrej actually does answer your problem, just not how you want it.

The only way you can handle this with regular expressions is by limiting yourself to a definitive depth of recursion and match everything up to this depth, which I think is not what you want.

To make this work for any number of nesting levels, you will have no other choice (that I know of) than using a parser for context-free languages.

Community
  • 1
  • 1
F.P
  • 17,421
  • 34
  • 123
  • 189
  • 1
    *Which is something that cannot be consumed by a regular expression* is totally wrong in the context of .NET regular expressions. See [*Balancing Groups Definition*](https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx#balancing_group_definition) and [*Fun With .NET Regex Balancing Groups*](http://blog.stevenlevithan.com/archives/balancing-groups). – Wiktor Stribiżew Dec 16 '15 at 10:01
  • @stribizhev I was not aware of that feature, thanks for the hint. – F.P Dec 16 '15 at 10:05
  • 2
    There is a recursion support for some other regex flavors, see [regular-expressions.info *Regular Expression Recursion*](http://www.regular-expressions.info/recurse.html) page. – Wiktor Stribiżew Dec 16 '15 at 10:08