1

.NET balanced group regexes make my head explode. I've got this string i'm trying to match:

other stuff blah blah....
                    {
                        stuff stuff
                        {key:
                            stuff
                            stuff
                        }
                    } more stuff.....

Here's my regex:

[^{}]*                      # anything that isn't { }
\{                          # starting with {
(?>                         # atomic group: throw away backtracks on exit
    [^{}]+
  |
    \{(?:\w+:)?(?<DEPTH>)   # on matching { plus optional "word:" push stack  
  |
    \}(?<-DEPTH>)           # on matching } pop stack
)*
(?(DEPTH)(?!))              # if depth unbalanced, fail match
\}                          # ending with }
[^{}]*                      # anything that isn't { }

So, I'm trying to match balancing curly braces, where some of the opening curly braces have an optional word followed by a colon. The above regex matches my example string, but if i remove a curly brace, (i.e. "unbalancing it):

    other stuff blah blah....
                    {
                        stuff stuff
                        {key:
                            stuff
                            stuff

                    } more stuff.....

...it still matches!

Can anyone tell me how to fix my regex?

Jeff Atwood
  • 63,320
  • 48
  • 150
  • 153
Matt Crouch
  • 1,708
  • 2
  • 18
  • 29
  • 1
    I took the liberty of commenting your regex; just call it with `RegexOptions.IgnoreWhitespace`. It is extremely helpful in avoiding Exploding Head Syndrome :) – Jeff Atwood Jan 11 '10 at 10:37
  • @Jeff: You can build the rule of `IgnoreWhitespace` in to the pattern with `(?x) # IgnorePatternWhitespace`. See http://stackoverflow.com/questions/1961369/net-regex-parsing-of-the-newline-character/1961577#1961577 for an example. – Jay Bazuzi Jan 16 '10 at 20:46

2 Answers2

4

Have you checked what it is matching in the second case? Since you don't have any anchors, I think the engine starts the match just after the first '{'. From there, until the end, the string matches.

Try surrounding the pattern with \A and \z.

Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
-1

Not to be a kill joy but what you're attempting to do with a regular expression is boggling your mind because it's simply not possible. Regular expressions are a class of finite automata and do not posses enough state in which to peform recursive/nested matching. You'll need a context free grammar of sorts in order to get this working.

There are some regular expression engines which do support a notion of recursion. These are not strictly speaking regular expressions though. Can you tell us what engine you are using because it's possible it has a recursion function which will help out this scenario.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • 4
    Subject line says .NET Regex. I already know the engine supports it. See http://blog.stevenlevithan.com/archives/balancing-groups – Matt Crouch Oct 08 '08 at 17:38