1

I'm trying to come up with a regular expression that will allow me to parse a colon delimited token, formatted as a key and value pair.

[/foo:bar] // where [key:value]

The tricky part is that a token value can contain another token.

[/foo:[/foo:bar]]

In the above case, I want to match foo as the key and [/foo:bar] as the value. The following expression works:

^\[/([^:]+):(.*)\]+$

However, this does not work when the string has multiple tokens. For example:

[/foo:[/foo:bar]][/foo:bar] // results in foo and [/GetPath:[/GetPath:]]][/foo:bar

I imagine I'm only slightly off, but I cannot, for the life of me, figure out what I'm missing.

csano
  • 13,266
  • 2
  • 28
  • 45
  • 2
    Nested stuff is a very bad thing for RegEx to match. It's only really achievable through engine-specific special operators. For that reason, it is generally considered that wiring up your own quick parser is much more efficient than spending days on a single expression to attempt it. – Simon Whitehead Jul 18 '13 at 22:24
  • Please check the famous [parsing HTML with regular expressions](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) for suggestions on parsing nested braces (seriously, just skip funny once)... On other hand finding sample for proper parsing of an XPath expressions may be better approach. – Alexei Levenkov Jul 18 '13 at 22:26
  • Regex'es are a mandatory part of every modern programmer's toolkit; so is being able to whip up a trivial parser. My advice is to abandon regex for this task, for which they are unsuitable, and practice your parser writing. – Pieter Geerkens Jul 18 '13 at 23:04
  • You can use [.NET regex balancing groups](http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition) to count and solve such problems if you want to do it with regex. – Qtax Jul 18 '13 at 23:11

1 Answers1

0

This regex matches name with a nested value:

\[/([^:]+):(.*?)\](?=\[|$)

The change was to add a look ahead that asserts the character following the delimiting right square bracket must be a new term or end of input.

As per this live demo on rubular, when using input

[/foo:[/foo:bar]][/foo:bar]

The resulting matches are:

Match 1...
Group 1:    foo
Group 2:    [/foo:bar]
Match 2...
Group 1:    foo
Group 2:    bar
animuson
  • 53,861
  • 28
  • 137
  • 147
Bohemian
  • 412,405
  • 93
  • 575
  • 722