1

I am trying to write a regular expression to match ruby's list and hash syntaxes, e.g.:

[:a, "b", c, 3]

{:a => [
    1,2,3
]}

[1, {
    a => "t", :b => "w",
    :c => :o
}, 3]

The issue, of course, is with the nested/recursive nature of the things. I have a sneaking suspicion that such nested structures cannot actually be expressed as a regular expression as that 'language' is not regular. I expect the solution would have to involve subroutines and recursion, however I'm struggling to get my head around it. Can anyone confirm/deny my suspicions or offer a solution?

Any help appreciated.

Edit: as a note, I'm using PHP's preg_* methods mainly

Edit: as another note, I've created a routine, <ruby_value> to match keys and scalar values.

Edit: I should specify that I'm more interested in this "out of interest". I have already wrote a mini-parser for these things in PHP however I am interested to see if a not-necessarily-pure-regex solution exists.

E.g. equal nested brackets:

/^(?<paren_expr>
    \( (?: (?&paren_expr) | ) \)
)$/x

Which is a valid PHP regex and will match "(())", "()" and "((((((()))))))" but not "(" or "(()" etc.

connec
  • 7,231
  • 3
  • 23
  • 26
  • 1
    You are correct. Regular expressions do not work for recursive structures. – Lily Ballard Feb 09 '11 at 01:21
  • I have kind of answered my own question here... the motivating solution was [this SO solution](http://stackoverflow.com/questions/764247/why-are-regular-expressions-so-controversial/4053506#4053506) – connec Feb 09 '11 at 02:02

2 Answers2

1

Correct, it's not regular so you can't match it with 1 regular non-recursive expression.

You can, however, make a loop which replaces every match till there are no more matches available.

So...

[[[ foo ]]]

[[PLACEHOLDER_001]]

[PLACEHOLDER_002]

PLACEHOLDER_003

That way you can make it work without a problem. Can't say it's such a pretty solution though. A stack based solution would be better.

Wolph
  • 78,177
  • 11
  • 137
  • 148
0

You are correct that nested structures are not a regular language, and thus cannot be expressed via pure regular expressions.

PCRE has the ability to specify recursive regexes, although I'm not sure whether PHP's implementation of them includes that support.

Really though, what you want to do is write a state machine yourself (with memory of nesting).

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 1
    A regular expression *is* (a representation of) a finite state machine. You'll need something a bit more powerful (like a push-down automaton) in order to match a recursive language. – Anon. Feb 09 '11 at 01:27
  • Sure. But 'state machine' is a more general term than simply FSMs. Push-down automatons are still state machines, they're just a subclass that also has the ability to remember things. – Amber Feb 09 '11 at 02:41