2

I need to write regex that capture generic arguments (that also can be generic) of type name in special notation like this:

System.Action[Int32,Dictionary[Int32,Int32],Int32]

lets assume type name is [\w.]+ and parameter is [\w.,\[\]]+ so I need to grab only Int32, Dictionary[Int32,Int32] and Int32

Basically I need to take something if balancing group stack is empty, but I don't really understand how.

UPD

The answer below helped me solve the problem fast (but without proper validation and with depth limitation = 1), but I've managed to do it with group balancing:

^[\w.]+                                              #Type name
\[(?<delim>)                                         #Opening bracet and first delimiter
[\w.]+                                               #Minimal content
(
[\w.]+                                                       
((?(open)|(?<param-delim>)),(?(open)|(?<delim>)))*   #Cutting param if balanced before comma and placing delimiter
((?<open>\[))*                                       #Counting [
((?<-open>\]))*                                      #Counting ]
)*
(?(open)|(?<param-delim>))\]                         #Cutting last param if balanced
(?(open)(?!)                                         #Checking balance
)$

Demo

UPD2 (Last optimization)

^[\w.]+
\[(?<delim>)
[\w.]+
(?:
 (?:(?(open)|(?<param-delim>)),(?(open)|(?<delim>))[\w.]+)?
 (?:(?<open>\[)[\w.]+)?
 (?:(?<-open>\]))*
)*
(?(open)|(?<param-delim>))\]
(?(open)(?!)
)$
Kovpaev Alexey
  • 1,725
  • 6
  • 19
  • 38
  • 2
    Try [`\w+(?:\.\w+)*\[(?:,?(?\w+(?:\[[^][]*])?))*`](http://regexstorm.net/tester?p=%5cw%2b(%3f%3a%5c.%5cw%2b)*%5c%5b(%3f%3a%2c%3f(%3f%3cres%3e%5cw%2b(%3f%3a%5c%5b%5b%5e%5d%5b%5d*%5d)%3f))*&i=System.Action%5bInt32%2cDictionary%5bInt32%2cInt32%5d%2cInt32%5d%0d%0a). The `${res}` captures will contain the values. You do not need any balancing groups here if you have 1 nested level of `[...]`. I am not even sure you need `\w+(?:\.\w+)*` – Wiktor Stribiżew Aug 02 '16 at 12:38

1 Answers1

2

I suggest capturing those values using

\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*

See the regex demo.

Details:

  • \w+(?:\.\w+)* - match 1+ word chars followed with . + 1+ word chars 1 or more times
  • \[ - a literal [
  • (?:,?(?<res>\w+(?:\[[^][]*])?))* - 0 or more sequences of:
    • ,? - an optional comma
    • (?<res>\w+(?:\[[^][]*])?) - Group "res" capturing:
      • \w+ - one or more word chars (perhaps, you would like [\w.]+)
      • (?:\[[^][]*])? - 1 or 0 (change ? to * to match 1 or more) sequences of a [, 0+ chars other than [ and ], and a closing ].

A C# demo below:

var line = "System.Action[Int32,Dictionary[Int32,Int32],Int32]";
var pattern = @"\w+(?:\.\w+)*\[(?:,?(?<res>\w+(?:\[[^][]*])?))*";
var result = Regex.Matches(line, pattern)
        .Cast<Match>()
        .SelectMany(x => x.Groups["res"].Captures.Cast<Capture>()
            .Select(t => t.Value))
        .ToList();
foreach (var s in result) // DEMO
    Console.WriteLine(s);

UPDATE: To account for unknown depth [...] substrings, use

\w+(?:\.\w+)*\[(?:\s*,?\s*(?<res>\w+(?:\[(?>[^][]+|(?<o>\[)|(?<-o>]))*(?(o)(?!))])?))*

See the regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Your solution helped me today and I was able to move on instantly, but then I've managed to solve it with balancing groups. Solution in updated question. – Kovpaev Alexey Aug 02 '16 at 19:43
  • You actually could use a mod of my regex: [`\w+(?:\.\w+)*\[(?:\s*,?\s*(?\w+(?:\[(?>[^][]+|(?\[)|(?<-o>]))*(?(o)(?!))])?))*`](http://regexstorm.net/tester?p=%5cw%2b(%3f%3a%5c.%5cw%2b)*%5c%5b(%3f%3a%5cs*%2c%3f%5cs*(%3f%3cres%3e%5cw%2b(%3f%3a%5c%5b(%3f%3e%5b%5e%5d%5b%5d%2b%7c(%3f%3co%3e%5c%5b)%7c(%3f%3c-o%3e%5d))*(%3f(o)(%3f!))%5d)%3f))*&i=System.Action%5bInt32%2cDictionary%5bInt32%2cDictionary%5bInt32%2cInt32%5d%5d%2cInt32%5d%0d%0a). Your `[\w.]` matches substrings starting with `.` and that is wrong, since an identifier starts with `[a-zA-Z_]` – Wiktor Stribiżew Aug 02 '16 at 19:48