4

I'm looking for a regular expression (in C#) that matches the following cases:

  • {a}
  • {a:b}
  • {a:{b}}
  • {a:{b:c}}
  • etc.

  • {a}{b}

  • {a}{b}{c}
  • etc.

  • a{b}

  • {a}b
  • a{b}{c}
  • {a}b{c}
  • {a}{b}c
  • etc.

Where a, b, c can be any string.

So far I've got something like: .*[\{].+?[\}].* but this matches the case {a}{b} entirely, in stead of returning two Matches, namely {a} and {b}

The expression is used to verify that some string is a coded string. If it is, it needs to get the separate pieces from the coded string (Regex.Matches() would be handy) and parse them.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Knots
  • 551
  • 8
  • 19
  • 3
    You cannot do this, [regex is not powerful enough to match nested braces correctly](http://stackoverflow.com/a/133684/335858). – Sergey Kalinichenko Jun 05 '13 at 15:00
  • 8
    This language cannot be matched by regular expressions that are actually **regular** because it requires a push-down automaton, not a finite automaton. However, modern "regular" expressions are not actually regular, so there might be a way to do this. That said, my advice would be to *write a lexer and parser* rather than trying to do this in regular expressions. – Eric Lippert Jun 05 '13 at 15:01
  • The first `etc` is a bit vague, do you mean any bracket depth, as in `{a:{b:{c:{d:{e:...`, is possible? If so, regex won't be able to solve your problem. – Bernhard Barker Jun 05 '13 at 15:01
  • 3
    @dasblinkenlight: Though it is the case that "classic" regular expressions (which support alternation, concatenation and Kleene star operators) cannot recognize languages with nested parens, modern "regular" expressions can match some pushdown automaton languages. .NET in particular supports "balancing groups" -- see http://stackoverflow.com/questions/4284827/regular-expression-that-uses-balancing-groups for an example. – Eric Lippert Jun 05 '13 at 15:03
  • I'll go with the parser, as Eric suggested. The depth of the brackets can, in theory, be infinite. Thanks for the help. – Knots Jun 06 '13 at 09:46

2 Answers2

1

Description

You could do this by combining some recursive logic around a regex

This regex will match open and close brackets nested three layers deep like {a{b{c}}}{{{d}e}f}

\{((?:\{(?:\{.*?\}|.)*?\}|.)*?)\}

enter image description here

The dotted area is the basic search wherein that search is nested inside itself for as many layers as you need.

In the following example I'm simply running the regex against most of your examples. Combine this regex with a foreach loop that would take each Group 1 and capture all non open brackets from the start of the current string ^[^{]*, then recursively feed the rest of the string back through the regex above to capture the value inside the next group of brackets, then capture all the non close brackets from the end of the string [^}]*$.

Sample Text

{a}
{a:b}
{a:{b}}
{a:{b:c}}
{a}{b}
{a}{b}{c}
{a{b{c}}}{{{d}e}f}

C#.NET Code Example:

This C#.Net example is only show how the regex works. See how Group 1 gets the inner text from outer most group of brackets. Each outer bracketed text was broken into it's own array position, and the corresponding outer brackets where removed.

using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "sample text above";
          Regex re = new Regex(@"\{((?:\{(?:\{.*?\}|.)*?\}|.)*?)\}",RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline);
          MatchCollection mc = re.Matches(sourcestring);
          int mIdx=0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
              }
            mIdx++;
          }
        }
    }
}
$matches Array:
(
    [0] => Array
        (
            [0] => {a}
            [1] => {a:b}
            [2] => {a:{b}}
            [3] => {a:{b:c}}
            [4] => {a}
            [5] => {b}
            [6] => {a}
            [7] => {b}
            [8] => {c}
            [9] => {a{b{c}}}
            [10] => {{{d}e}f}
        )

    [1] => Array
        (
            [0] => a
            [1] => a:b
            [2] => a:{b}
            [3] => a:{b:c}
            [4] => a
            [5] => b
            [6] => a
            [7] => b
            [8] => c
            [9] => a{b{c}}
            [10] => {{d}e}f
        )

)

Disclaimer

This expression will only work to the third level of recursion. Outer text will need to be handled separately. The .net regex engine does offer recursion counting and may support N layers deep. As written here this expression may not handle capture g as expected in {a:{b}g{h}i}.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
1

You could also build a routine which simply parses each character in the sample string and keeps track the nested depth.

Powershell Example

I offer this powershell sample because I have a powershell console handy. This is only to demonstrate how the function would work.

$string = '{a}
{a:b}
a:{b}g{h}ik
{a:{b:c}}
{a}{b}
{a}{b}{c}
{a{b{c}}}{{{d}e}f}
'

$intCount = 0

# split the string on the open and close brackets, the round brackets ensure the squiggly brackets are retained
foreach ($CharacterGroup in $string -split "([{}])") {
    write-host $("+" * $intCount)$CharacterGroup
    if ($CharacterGroup -match "{") { $intCount += 1 }
    if ($CharacterGroup -match "}") { $intCount -= 1 }
    if ($intCount -lt 0) { 
        Write-Host "missing close bracket"
        break
        } # end if
    } # next $CharacterGroup

Yields

 {
+ a
+ }


 {
+ a:b
+ }

a:
 {
+ b
+ }
 g
 {
+ h
+ }
 ik

 {
+ a:
+ {
++ b:c
++ }
+ 
+ }


 {
+ a
+ }

 {
+ b
+ }


 {
+ a
+ }

 {
+ b
+ }

 {
+ c
+ }


 {
+ a
+ {
++ b
++ {
+++ c
+++ }
++ 
++ }
+ 
+ }

 {
+ 
+ {
++ 
++ {
+++ d
+++ }
++ e
++ }
+ f
+ }
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • Even though it's not a regular expression, I'll check your answer. In theory the depth of the brackets can be infinite, so Denomal... your other answer below just won't do the trick. Thanks for trying though. =) – Knots Jun 06 '13 at 09:44