A more simple way to find the last balanced square brackets part in a string with the .net regex engine is to search the string from right to left using the Regex.RightToLeft
property. This way you avoid:
- to search all the string for nothing
- to check the end of the string with a lookahead since the pattern returns the first match on the right.
code:
string input = @"[hello] [world] [hello [world\]] ]";
string rtlPattern = @"(?(c)(?!))\[(?>\\.|(?<!\\)[^][]+|(?<-c>)\[|(?<c>)])*]";
Match m;
m = Regex.Match(input, rtlPattern, RegexOptions.RightToLeft);
if (m.Success)
Console.WriteLine("Result: {0}", m.Groups[0].Value);
demo
Note that to well understand what happens you also need to read parts of the pattern from right to left. Details:
] # a literal closing square bracket
(?> # open an atomic group (*)
\\. # any escaped character with a backslash
|
[^][]+ # all that isn't a square bracket
(?<!\\) # not preceded by a backslash
|
(?<-c>) \[ # decrement the c stack for an opening bracket
|
(?<c>) ] # increment the c stack for a closing bracket
)* # repeat zero or more times
\[ # a literal square opening bracket
(?(c) # conditional statement: true if c isn't empty
(?!) # always failing pattern: "not followed by nothing"
)
(*) Note that using an atomic group is mandatory here to avoid an eventual catastrophic backtracking since the group contains an item with a +
quantifier and is itself repeated. You can learn more about this problem here.
This pattern already deals with escaped nested brackets and you can also add the Regex.Singleline
property if you want to match a part that includes the newline character.