2

I'm looking to tokenize really simple strings,but struggling to get the right Regex.

The strings might look like this:

string1 = "{[Surname]}, some text... {[FirstName]}"

string2 = "{Item}foo.{Item2}bar"

And I want to extract the tokens in the curly braces (so string1 gets "{[Surname]}","{[FirstName]}" and string2 gets "{Item}" and "{Item2}")

So basically, there's two different token types I want to extract: {[Foo]} and {Bar}.

this question is quite good, but I can't get the regex right: poor mans lexer for c# Thanks for the help!

Community
  • 1
  • 1
Pete
  • 51
  • 1
  • 5

3 Answers3

3

They're both good answers guys, thanks. Here's what I settled for in the end:

// DataToken = {[foo]}

// FieldToken = {Bar}

string pattern = @"(?<DataToken>\{\[\w+\]\})|(?<FieldToken>\{\w+\})";

MatchCollection matches = Regex.Matches(expression.ExpressionString, pattern,
RegexOptions.ExplicitCapture);

string fieldToken = string.Empty;
string dataToken = string.Empty;

foreach (Match m in matches)

{
    // note that EITHER fieldtoken OR DataToken will have a value in each loop
    fieldToken = m.Groups["FieldToken"].Value;
    dataToken = m.Groups["DataToken"].Value;

    if (!string.IsNullOrEmpty(dataToken))
    {
         // Do something
    }

    if (!string.IsNullOrEmpty(fieldToken))
    {
         // Do something else
   }
}
Pete
  • 51
  • 1
  • 5
1

Unless rules are very convoluted, that will be (?<Token>\{\[.+?\]\}) for the first string and (?<Token>\{.+?\}) for the second

Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288
1

what about (?<token>\{[^\}]*\})

pierroz
  • 7,653
  • 9
  • 48
  • 60