1

What is best practice for parsing the following blocks in c#-regex?

BLOCK:1 { name:Apple, value:Juice, name:xxx, value:yyy, name:Apple, value:Juice }

BLOCK:2 { name:Banana, value:Smell }

BLOCK:3 { }

Edited: flexible blocks

jack-london
  • 1,599
  • 3
  • 21
  • 42

2 Answers2

1

The following should get you the expression and the groups 'id', 'name' and 'value' for further use.

BLOCK:(?<id>[0-9+]) { name:(?<name>.+), value:(?<value>.+) }

You can then split the block by newlines, and try to parse each line with the regex, then process them if it's a succesful match.

Regex pattern = new Regex("BLOCK:(?<id>[0-9]+) { name:(?<name>.+), value:(?<value>.+) }");
foreach(var line in block.Split('\n'))
{
    Match match = pattern.Match(line);
    if(match.Success)
    {
         Process(match.Groups["value"].Value);
    }
}

edit
Use something like BLOCK:(?<id>[0-9]+) {(?<inner>.+)} to do the first match.

Then check this question for further info

Community
  • 1
  • 1
Jan Jongboom
  • 26,598
  • 9
  • 83
  • 120
  • Sorry. changed the question. What about when i want some extensible block name and values? – jack-london Jan 12 '10 at 12:47
  • `.+` is greedy, you probably want to limit it to `\w+` or `[^,}]+`. At the very least, turn it to a non-greedy `.+?`. – Kobi Jan 12 '10 at 12:50
1

This might be too simple to require Antlr, but it wouldn't hurt to check it out for future reference.

If you need to parse anything at all reasonably complex then you just can't beat http://www.antlr.org/

And, of course, it's associated GUI - http://www.antlr.org/works/index.html

A picture speaks a thousand words - http://www.antlr.org/works/screenshots/editor.jpg

Simpler to code, simpler to debug ...

Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • Yes.. MGrammar is similar project to antlr. (http://msdn.microsoft.com/en-us/library/dd129519(VS.85).aspx) But i don't know in a small dsl like in the example.. Do i really need this big libraries? – jack-london Jan 18 '10 at 15:51