5

Assuming we have a string containing different parentheses (by parentheses in this context I mean (, [, and { ) the string could also contain other content like so {[balanced(parenthesis)]} but the rest of the content will largely be ignored. Would it be possible to use regular expression to control that all different parenthesis are:

  1. "closed", that is that each "opening" parenthesis is matched by a "closing" parenthesis, and
  2. at the right position in the string?
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Overly Excessive
  • 2,095
  • 16
  • 31
  • This is **NOT** a duplicate because of the several types of parenthesis.. – Lucas Trzesniewski Oct 22 '14 at 13:14
  • 1
    See [this answer](http://stackoverflow.com/a/20644634/3764814). You don't want to do that with .NET regexes. – Lucas Trzesniewski Oct 22 '14 at 13:17
  • +1 Regular expressions are not designed to match nested or recursive elements like open/close braces. You should implement your own parsing / validation function that does this by hand. Regular Expressions may be useful within that function, but your primary tool should be C# and the methods in the `String` class, not the `Regex` class. – qJake Oct 22 '14 at 19:07
  • The answer you've accepted has lots of false positives. It doesn't solve this problem. – Servy Oct 22 '14 at 20:23
  • Right, unaccepted for now, pending modifications. – Overly Excessive Oct 22 '14 at 20:29
  • 1
    BTW, I feel bad about advertising, but this problem was one of the main motivations for me to start [this](https://github.com/ltrzesniewski/pcre-net) lib, as there really is no easy way to do it with the .NET regexes since they don't handle recursion. – Lucas Trzesniewski Oct 22 '14 at 22:35
  • 3
    It turned out that this has been done by [Kobi](http://stackoverflow.com/users/7586/kobi) on his blog: https://kobikobi.wordpress.com/2010/12/14/net-regex-matching-mixed-balanced-parentheses/ The solution is less convoluted than my answer. – nhahtdh Oct 23 '14 at 03:00

4 Answers4

4

It is possible to do this with the regex below, albeit a bit ugly.

This is a matching regex rather than a validation regex, but you can add anchor to turn it into a validation one by adding anchors at the beginning and the end.

(?:
    (?>[^(){}\[\]]+)
    |
    (?<open>
      (?=(?<mr>\())(?<mc>)(?<ms>)\(|
      (?=(?<mc>\{))(?<mr>)(?<ms>)\{|
      (?=(?<ms>\[))(?<mc>)(?<mr>)\[
    )
    |
    (?:
        (?<-open>
          (?!\k<mc>)\}
          |
          (?!\k<mr>)\)
          |
          (?!\k<ms>)\]
        )
        (?<-mr>)(?<-mc>)(?<-ms>)
    )
)+(?(open)(?!))

Since we can't read the top stack, we would have to emulate it by the 3 capturing groups mr, mc and ms. The number of items in mr, mc, ms and open are always the same. When the stack open is not empty, only one out of the 3 capturing groups contains the corresponding opening bracket, the other 2 capture an empty string. The non-empty string capturing group is always the type of the bracket at the top of the stack.

This allows us to match the corresponding closing bracket by asserting that the corresponding captured group can't be matched, e.g. (?!\k<mc>)\}. We know that the group can't be empty (since it has to pass the check on (?<-open>) first). There are only 2 cases left:

  • If the stack top is an empty string, then the backreference will always match, and the assertion always fails.
  • If the stack top contains the corresponding opening bracket, then the backreference will fail, and the assertion succeeds, and we proceed to match the closing bracket.
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • `(()` validates as `true` when it should be `false`. `())` also validates as true. – Servy Oct 22 '14 at 20:22
  • @Servy: Please check the match of group 0. It finds a match because I didn't anchor. – nhahtdh Oct 22 '14 at 20:26
  • What about the match of the first group? The expression has a whole should either match or not match depending on whether or not the input is valid. In the first example I provided the first group is `()`, and the regex matches, but it *shouldn't* match; it should reject the input because it's not valid, as per the requirements. – Servy Oct 22 '14 at 20:28
4

It turns out this problem has been solved by Kobi in this blog post of his. For simply balancing different types of brackets, the solution is quite simple and elegant, and not as crazy as the answers in this question, which has a more complex grammar.

Let me quote the essential part of the blog post below:

[...] who says I must push what I matched to the stack? What if wanted to push an arbitrary string to the stack? When I see an open bracket I really want to push a closing bracket – but how can I?

The trick is to use a look-ahead: find the next occurrence of the character I want to push, and push it:

{(?=.*?(<Stack>}))

Next, when I want to match a closing brace, I already have the right one in stack. Using this approach, here’s a regex that matches tokens with matching balanced parentheses of 3 different typed:

(
    [^(){}\[\]]+
    | \( (?=[^)]*  (?<Stack> \) ) )
    | \[ (?=[^\]]* (?<Stack> \] ) )
    | \{ (?=[^}]*  (?<Stack> \} ) )
    | \k<Stack> (?<-Stack>)
)+?
(?(Stack) (?!))

Of course, this approach does have limitations – you may not find the character you’d like to push (which might be a good thing – it allows you to fail early). It also gets much trickier if you want to balance something more complicated than constant strings, but that’s another subject.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
0

No, the language you have just described the requirements of is not a regular language, and as such Regular Expressions are not suitable to solving that problem. You will need to use a more robust lexing/parsing tool than just regexes.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • FYI, balancing group with 2 types of brackets: http://stackoverflow.com/questions/3349999/converting-pcre-recursive-regex-pattern-to-net-balancing-groups-definition – nhahtdh Oct 22 '14 at 19:14
  • 1
    @nhahtdh Notice how he says it's *almost* working. Almost working is not working. And the piece that he can't get working is a piece that is impossible for him to solve using a regex. But yes, you can use regexes to solve entirely different problems that aren't the problem described in the OP, yes. – Servy Oct 22 '14 at 19:15
  • 1
    The second answer is the working answer. The first one is broken. – nhahtdh Oct 22 '14 at 19:19
  • @nhahtdh It is limited to a finite depth (he wrote it out to 6), not an arbitrary depth. It would fail to parse this: `((((((()))))))` – Servy Oct 22 '14 at 19:24
  • 1
    Just a quick comment: the linked answer is extremely academic in nature and not useful at all for production code, but it is definitely not limited to 6, as far as I know. As for the "regex is for regular" mantra, see this: http://stackoverflow.com/a/6752487/7586 - I hope it makes sense. Thanks! – Kobi Oct 23 '14 at 15:27
-1

If all you are trying to do is match parenthesis, try this code out:

public class Program
{

    public static void Main()
    {
        string testString1 = "{[balanced(parenthesis)]}";
        string testString2 = "(test)[wrong bracket type)";
        string testString3 = "(test)[Mismatched]((sdff)";

        bool isValid1 = ValidateString(testString1);
        bool isValid2 = ValidateString(testString2);
        bool isValid3 = ValidateString(testString3);

        if (isValid1)
            Console.WriteLine("TestString1 is balanced correctly!");
        else Console.WriteLine("TestString1 is NOT balanced properly!");

        if (isValid2)
            Console.WriteLine("TestString2 is balanced correctly!");
        else Console.WriteLine("TestString2 is NOT balanced properly!");

        if (isValid3)
            Console.WriteLine("TestString3 is balanced correctly!");
        else Console.WriteLine("TestString3 is NOT balanced properly!");

    }

    public static bool ValidateString(string testString)
    {
        int p1 = 0;
        int p2 = 0;
        int p3 = 0;

        var lastOpener = new Stack<char>();

        foreach (char c in testString)
        {
            if (c == '(') {
                p1++;
                lastOpener.Push(c);
            }
            if (c == '[') {
                p2++;
                lastOpener.Push(c);
            }
            if (c == '{') {
                p3++;
                lastOpener.Push(c);
            }

            try {               
                if (c == ')' && lastOpener.Pop() == '(')
                    p1--;
                if (c == ']' && lastOpener.Pop() == '[')
                    p2--;
                if (c == '}' && lastOpener.Pop() == '{')
                    p3--;
            } catch { return false; }
        }

        if (p1 != 0 || p2 != 0 || p3 != 0)
            return false;
        return true;
    }
}

All you need to do is call the ValidateString() method, passing it the string you want to test. It will test for mismatched brackets (3 different kinds, [], (), {}) and test to make sure the brackets are closed in the right spot (as you can see from my 3 test strings). It will return true if its valid, or false otherwise.

How it works is the function first creates a Stack object. It iterates through each character in the string. If it finds an opening bracket, it pushes that bracket to the stack and increments the counter for that bracket by one. If it finds a closing bracket, it pops the opening bracket from the stack and if they match, it decrements our bracket counter.

After iterating through each character, it tests the counter of each different type of bracket and if all three are 0, then we know its balance/matched properly!

Fiddle

Icemanind
  • 47,519
  • 50
  • 171
  • 296
  • The question is asking if it's possible to use a regex to solve this problem. You have in no way addressed that question. – Servy Oct 22 '14 at 19:52
  • @Servy - The answer is No. Thats why I showed him an alternative way to do it. – Icemanind Oct 22 '14 at 20:04
  • In addition to being offtopic, as this isn't what the question is asking for, your code is broken in quite a number of different places, in addition to being noticeably more verbose than is needed. – Servy Oct 22 '14 at 20:06
  • @icemanind Doing this without using .NET RegEx is quite trivial and like Servy said it is not what I asked for in the OP. – Overly Excessive Oct 22 '14 at 20:27
  • @Servy - My code can't be broken. I have a fiddle of it up. It runs perfect. Try the fiddle yourself – Icemanind Oct 22 '14 at 21:06
  • 1
    @icemanind Of course it can be broken. `)` not only doesn't return the correct result *it throws an exception*. – Servy Oct 23 '14 at 15:34
  • @Servy - Why isn't my fiddle throwing an exception? – Icemanind Oct 23 '14 at 16:22
  • @icemanind Because it's using different examples than mine. You never have more closing parens than opening parens in any of the examples you chooose, which is what breaks your code. If you actually used the example I gave, you'd see it break. – Servy Oct 23 '14 at 16:24
  • Wow that's really, really terrible design, like, *really* terrible, especially considering that this whole thing can be implemented with like 5-ish statements. – Servy Oct 23 '14 at 17:58
  • But it works! And without using RegEx. And my design isn't that long. Keep in mind, the whole thing is done in my `ValidateString` method. The rest of the code I posted is just testing experiments. – Icemanind Oct 23 '14 at 18:22