5

I want to select a part of a string, but the problem is that the last character I want to select can have multiple occurrences.

I want to select 'Aggregate(' and end at the matching ')', any () in between can be ignored.

Examples:

string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10)

string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10)), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10))

string: Substr(Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)
should return: Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) )

Is there a way to solve this with a regular expression? I'm using C#.

Jerry
  • 70,495
  • 13
  • 100
  • 144
Jan V
  • 127
  • 8
  • 2
    You should specify the flavor/language you're using. This is possible using recursive patterns which is not available in every language. – HamZa Aug 07 '13 at 13:17
  • 8
    It's *possible* with recursive regex, but I'd advise against it. it's clearly a recursive structure, what you want is a full-fledged parser, which regex is not. – AdamKG Aug 07 '13 at 13:18
  • This looks like a job for a language parser rather than regex. – Spudley Aug 07 '13 at 13:23
  • 1
    For nested brackets, [this](http://stackoverflow.com/questions/7898310/using-regex-to-balance-match-parenthesis) should be enough to work from. – Bernhard Barker Aug 07 '13 at 13:24
  • Can there be nested sets of brackets? – Bohemian Aug 07 '13 at 13:25
  • Yes, but i think i'd go with Adams advise and just solve the problem in C#. Knowing the limits of regex is useful aswell :) – Jan V Aug 07 '13 at 13:50

4 Answers4

3

This is a little ugly, but you could use something like

Aggregate\(([^()]+|\(.*?\))*\)

It passes all your tests, but it can only match one level of nested parentheses.

Michelle
  • 2,830
  • 26
  • 33
Joey
  • 344,408
  • 85
  • 689
  • 683
  • I like this expression, care to explain :p – HamZa Aug 07 '13 at 13:21
  • I forgot to mention that users create those strings, so it is possible that they add multiple parentheses. – Jan V Aug 07 '13 at 13:29
  • @JanV Then your only option is balanced groups if you insist using regex. – HamZa Aug 07 '13 at 13:31
  • @HamZa: This expression matches `Aggregate`, followed by an opening parenthesis, followed by either any non-parenthesis or something in parentheses; that part can repeat as often as it wants to. – Joey Aug 07 '13 at 17:42
  • @Јοеу I know, it's just "better" if regex answers contains some explanation. Otherwise it's just `gimme ze codez` and `here ya go`. Take a look at this [meta Q](http://meta.stackexchange.com/questions/177757/are-answers-that-just-contain-a-regular-expression-pattern-really-good-answers). – HamZa Aug 07 '13 at 17:57
1

This solution works with any level of nested parenthesis by using .NETs balancing groups:

(?x)              # allow comments and ignore whitespace
Aggregate\(
(?:
  [^()]           # anything but ( and )
| (?<open> \( )   # ( -> open++
| (?<-open> \) )  # ) -> open--
)*
(?(open) (?!) )   # fail if open > 0
\)


I'm not sure how much the input varies but for the string examples in the question something as simple as this would work:

Aggregate\(.*\)(?=,)
Qtax
  • 33,241
  • 9
  • 83
  • 121
0

This regex works with any number of pairs of brackets, and nested to any level:

Aggregate\(([^(]*\([^)]*\))*[^()]\)

For example, it will find the bolded text here:

Substr(Aggregate(SubQuery, SUM(foo(bar), baz()), ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)

Notice the SUM(foo(bar), baz()) in there.

See a live demo on rubular.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

If eventually consider avoiding regular expressions, here's an alternative for parsing, which uses the System.Xml.Linq namespace:

class Program
{
    static void Main()
    {
        var input = File.ReadAllLines("input.txt");
        input.ToList().ForEach(item => {
            Console.WriteLine(item.GetParameter("Aggregate"));
        });
    }

}
static class X
{
    public static string GetParameter(this string expression, string element)
    {
        XDocument doc;
        var input1 = "<root>" + expression
            .Replace("(", "<n1>")
            .Replace(")", "</n1>")
            .Replace("[", "<n2>")
            .Replace("]", "</n2>") +
            "</root>";
        try
        {
            doc = XDocument.Parse(input1);
        }
        catch
        {
            return null;
        }
        var agg=doc.Descendants()
            .Where(d => d.FirstNode.ToString() == element)
            .FirstOrDefault();
        if (agg == null)
            return null;
        var param = agg
            .Elements()
            .FirstOrDefault();
        if (param == null)
            return null;
        return element +
            param
            .ToString()
            .Replace("<n1>", "(")
            .Replace("</n1>", ")")
            .Replace("<n2>", "[")
            .Replace("</n2>", "]");
    }
}
Alex Filipovici
  • 31,789
  • 6
  • 54
  • 78
  • It uses the `System.IO` namespace as well. And `System`. I'm not sure why pointing out one of three required `using`s is relevant. – Joey Aug 08 '13 at 05:04