1

Prologue:

Input string: 1(2),3,4(5,6(7,8),9),10

I am using C# and I would want to eventually get a List<foo> from the above expression

public class foo
{
    public int bar { get; set; }
    public List<foo> listOfFoo { get; set; }
}

I can achieve the task by writing some validations and parsing character by character, but would like to know a better way. Lesser the code, lesser the bugs they say ;)

Query

I am looking for a regular expression for validating and possibly capturing information in a string like

1(2),3,4(5,6(7,8),9),10

The string is basically a set of numbers separated by comma. But a number can have some sub expressions for it using parenthesis ( )

What I want to fetch from the string is a graph like

1
    2
3
4
    5
    6
        7
        8
    9
10

I have very little idea about reg-ex. I can read & understand most of them, but writing one I find really tough

Looking for someone to tell me if something like this is at all achievable using RegEx. If so, what should be the approach? I can see that I would need a recursive expression, any links or examples would be of great help. Someone willing to give me the RegEx itself would be icing on the cake :)

Jugal Thakkar
  • 13,432
  • 4
  • 61
  • 79
  • 4
    this is not regex task – burning_LEGION Apr 04 '13 at 11:54
  • +1 not a regex task; use/write a proper parser – Alex Apr 04 '13 at 11:56
  • This sounds similar to [trying to parse html (tree like structure) with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – George Duckett Apr 04 '13 at 11:57
  • Thanks guys for the very quick feedback!! I will indeed write a parser. Will keep the question open for the future though – Jugal Thakkar Apr 04 '13 at 12:01
  • To get a regular expression, its necessary to prove that such language is regular (a machine with finite states can accept strings such that) – Arnaldo Ignacio Gaspar Véjar Apr 04 '13 at 12:12
  • 4
    @ArnaldoGaspar That is true for theoretical regex; in practice the regex engines we use in our various programming languages add capabilities that exceed such a mandate. For example, theoretical regex does not have a concept of lookahead/lookbehind. PHP even supports the concept of recursive matching. – Kenneth K. Apr 04 '13 at 12:25

1 Answers1

2

.NET regex has balancing groups which allow you to count and match balanced parenthesis like in this case.

For that you can use an expression like this:

(?x)                 # ignore spaces and comments
^
(?:
    (?<open> \( )*   # open++
    \d+
    (?<-open> \) )*  # open--
    (?:
        , (?!\z)     # match a , but not at end of string
    |   \z           # or end of string
    )
)+
\z
(?(open) (?!) )      # fail if unbalanced (open > 0)

This would validate but not parse the string. To build a tree like you desire you have to use a parser I believe.

Qtax
  • 33,241
  • 9
  • 83
  • 121