2

I have an input string as follows:

var format = "{0}({1:2}({*:8})){2:3}({3:16})";

For the "what are you doing this for" questions:
What the above format is telling us is much like the string.Format(string, args) method with some modifications.

  • {0} is an insert index
  • {1:2} is an insert index with specified length in bytes
  • ({3:16}) is a grouped insert index which retains a copy of the matching sequence

Expected graph of output:

  • {0}
  • ({1:2}({*:8}))
  • {2:3}
  • ({3:16})

What I'm getting now:

  • {0}
  • ({1:2}
  • ({*:8}))
  • {2:3}
  • ({3:16})

The regular expression I'm working with now:

var regExpr = @"\(?\{\(*([^/}]+)\)*\}\)?";

As an aside, since I am just now learning RegEx I expect comments about the efficiency of the expression.

IAbstract
  • 19,551
  • 15
  • 98
  • 146

2 Answers2

1

This is generally not possible using regular expressions, but read about Balancing Group Definition extension in .NET.

Konrad Kokosa
  • 16,563
  • 2
  • 36
  • 58
  • [really?](http://stackoverflow.com/questions/9813751/get-inner-patterns-recursively-using-regex-c-sharp) - I'm not trying to be a smartarse, I don't use .net, does that answer not suggest otherwise though? – OGHaza Dec 01 '13 at 00:03
  • @OGHaza see [this question](http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns) – Konrad Kokosa Dec 01 '13 at 00:10
  • Well, I'm not trying to immediately match on a nested 'grouping'. I only want the outer match. – IAbstract Dec 01 '13 at 00:15
  • And I'm not really following what the article is conveying on balanced group definitions. – IAbstract Dec 01 '13 at 00:16
  • @KonradKokosa, if you read further answers on that question it suggests .NET is one of a number of regex engines that are capable of such tasks. – OGHaza Dec 01 '13 at 00:19
1

Right, I think I've found a solution - this is probably a horribly inefficient pattern to match, but I was intrigued as to whether it could be done at all:

(((?<r>\{)|(?<-r>\})|(?<b>\()|(?<-b>\))|[^{}()]))+?(?(r)(?!))(?(b)(?!))

Working On RegexHero (The .NET Regex Tester) Also tested here

Explanation:

First we have (?<r>\{)|(?<-r>\})|(?<b>\()|(?<-b>\)

These are balancing groups. For every { found it adds the match to the r group, then for every } it removes that match from the r group. The same is done for ( and ) in the b group.

Then the final part of the alternation is [^{}()] which matches everything else - i.e. anything that might appear between the brackets.

Finally we have (?(r)(?!))(?(b)(?!)) (you may be able to use (?(r|b)(?!)) but it screwed up the results on 1 of the 2 testers I used). This is an if..then construct, it checks if there is anything in the r or b groups, if there is then (?!) - a negative lookahead for nothing, which always returns false. This makes the match return false if the brackets don't balance.

Since the + after the alternation is a lazy +? it'll match the shortest segments that keep the brackets balanced.

Which, on RegexHero at least, matches:

\1 {0}
\2 ({1:2}({*:8}))
\3 {2:3}
\4 ({3:16})
OGHaza
  • 4,795
  • 7
  • 23
  • 29