4

I have a string which contains some text followed by some brackets with different content (possibly empty). I need to extract the last bracket with its content:

atext[d][][ef] // should return "[ef]"
other[aa][][a] // should return "[a]"
xxxxx[][xx][x][][xx] // should return "[xx]"
yyyyy[] // should return "[]"

I have looked into RegexOptions.RightToLeft and read up on lazy vs greedy matching, but I can't for the life of me get this one right.

Community
  • 1
  • 1
Thorkil Holm-Jacobsen
  • 7,287
  • 5
  • 30
  • 43

4 Answers4

3

This regex will work

.*(\[.*\])

Regex Demo

More efficient and non-greedy version

.*(\[[^\]]*\])

C# Code

string input = "atext[d][][ef]\nother[aa][][a]\nxxxxx[][xx][x][][xx]\nyyyyy[]";
string pattern = "(?m).*(\\[.*\\])";
Regex rgx = new Regex(pattern);

Match match = rgx.Match(input);

while (match.Success)
{
    Console.WriteLine(match.Groups[1].Value);
    match = match.NextMatch();
}

Ideone Demo

It may give unexpected results for nested [] or unbalanced []

rock321987
  • 10,942
  • 1
  • 30
  • 43
  • 1
    to include nested `[]` cases you do `.*(\[.*?\])` https://regex101.com/r/bZ9tP4/2 – dnit13 Apr 24 '16 at 09:49
  • @dnit13 that can be done but it will not be correct either..e.g. `[][aa[bb]]`, you will get `[bb]` but according to me the output should be `[aa[bb]]` because its not the nested bracket that's last one..its just a perspective of looking though – rock321987 Apr 24 '16 at 09:53
  • @dnit13 i have added a modified version of your regex – rock321987 Apr 24 '16 at 09:56
  • 1
    yes its a matter of perspective indeed, since nested case is not mentioned by OP, it could be either one of them. :) – dnit13 Apr 24 '16 at 09:58
  • @rock321987 Thank you, it seems to be working. There is no need to support nested brackets. Can you make a short explanation to how it works? If I understand correctly, the pattern consumes everything up to the last bracket with `.*`, and then matches the last bracket. Is this correct? – Thorkil Holm-Jacobsen Apr 24 '16 at 10:00
  • @tahatmat exactly..you are absolutely correct..because `.*` is greedy it consumes(_not exactly like this because some backtracking is involved_) till last `[` and then find the contents till last `]` – rock321987 Apr 24 '16 at 10:03
0

Alternatively, you could reverse the string using a function similar to this:

public static string Reverse( string s )
{
    char[] charArray = s.ToCharArray();
    Array.Reverse( charArray );
    return new string( charArray );
}

And then you could perform a simple Regex search to just look for the first [someText] group or just use a for loop to iterate through and then stop when the first ] is reached.

Jake
  • 915
  • 1
  • 7
  • 22
0

With negative lookahead:

\[[^\]]*\](?!\[)

This is relatively efficient and flexible, without the evil .*. This will be also work with longer text which contains multiple instances.

Regex101 demo here

Dávid Horváth
  • 4,050
  • 1
  • 20
  • 34
0

The correct way for .net is indeed to use the regex option RightToLeft with the appropriate method Regex.Match(String, String, RegexOptions).

In this way you keep the pattern very simple and efficient since it doesn't produce the less backtracking step and, since the pattern ends with a literal character (the closing bracket), allows a quick search for possible positions in the string where the pattern may succeeds before the "normal" walk of the regex engine.

public static void Main()
{
    string input = @"other[aa][][a]";

    string pattern = @"\[[^][]*]";

    Match m = Regex.Match(input, pattern, RegexOptions.RightToLeft);

    if (m.Success)
        Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
}
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125