6

What is the most efficient way to parse a C# string in the form of

"(params (abc 1.3)(sdc 2.0)(www 3.05)....)"

into a struct in the form

struct Params
{
  double abc,sdc,www....;
}

Thanks

EDIT The structure always have the same parameters (same names,only doubles, known at compile time).. but the order is not granted.. only one struct at a time..

Betamoo
  • 14,964
  • 25
  • 75
  • 109
  • 2
    can you show the full example to include all the sample string? – Glennular May 03 '10 at 18:21
  • It's clear what you're asking. What do you want to have as the values for a and b? Is the structure dynamic, i.e. it might not always be the same set of variables? –  May 03 '10 at 18:22
  • 1
    Not sure how `abc` is a double, can you flesh out your example some more? – D'Arcy Rittich May 03 '10 at 18:22
  • 1
    Why the focus on efficiency? Wouldn't a way that works be better - params don't sound like they get parsed very often, or is this on the critical path of your app somehow? – Stewart May 03 '10 at 18:22
  • @Stewart: I think it is in my case.. I am getting these strings from online server, they are used for a real-time application.. – Betamoo May 03 '10 at 18:32
  • Like OrbMan said, how do you know 1.3 is double and not float, etc.? Do you ONLY have doubles as possible values? Do they always have a decimal point and one digit after? – Nelson Rothermel May 03 '10 at 18:34
  • @Stewart why do you think that performs and "it works" are antagonists? the C# sompiler parsers rather complex grammars but still performs very well (as almost any other compiler do) – Rune FS May 03 '10 at 19:09
  • @Rune FS - It isn't that I don't think performance is important - it surely is, it is just that in my experience if you don't know how to solve the problem yet, efficiency is the least of your problems. The question I was driving at though was "how important is efficiency here" - are we talking once in the lifetime of an app, or 1000 times a second. – Stewart May 03 '10 at 19:17
  • @Stewart I'm with you on that :) – Rune FS May 03 '10 at 19:42

8 Answers8

4
using System;

namespace ConsoleApplication1
{
    class Program
    {
        struct Params
        {
            public double abc, sdc;
        };

        static void Main(string[] args)
        {
            string s = "(params (abc 1.3)(sdc 2.0))";
            Params p = new Params();
            object pbox = (object)p; // structs must be boxed for SetValue() to work

            string[] arr = s.Substring(8).Replace(")", "").Split(new char[] { ' ', '(', }, StringSplitOptions.RemoveEmptyEntries);
            for (int i = 0; i < arr.Length; i+=2)
                typeof(Params).GetField(arr[i]).SetValue(pbox, double.Parse(arr[i + 1]));
            p = (Params)pbox;
            Console.WriteLine("p.abc={0} p.sdc={1}", p.abc, p.sdc);
        }
    }
}

Note: if you used a class instead of a struct the boxing/unboxing would not be necessary.

Simon Chadwick
  • 1,148
  • 6
  • 12
  • 1
    I think he wanted a dynamically built struct, possible via a Dictionary type object. (The example now includes 'www') – Glennular May 03 '10 at 19:40
  • @Glennular: His edit says the struct is fixed. But I agree with you anyway; I'd rather use a Dictionary than reflection for something like this. – Simon Chadwick May 03 '10 at 20:11
2

Do you need to support multiple structs ? In other words, does this need to be dynamic; or do you know the struct definition at compile time ?

Parsing the string with a regex would be the obvious choice.

Here is a regex, that will parse your string format:

private static readonly Regex regParser = new Regex(@"^\(params\s(\((?<name>[a-zA-Z]+)\s(?<value>[\d\.]+)\))+\)$", RegexOptions.Compiled);

Running that regex on a string will give you two groups named "name" and "value". The Captures property of each group will contain the names and values.

If the struct type is unknown at compile time, then you will need to use reflection to fill in the fields.

If you mean to generate the struct definition at runtime, you will need to use Reflection to emit the type; or you will need to generate the source code.

Which part are you having trouble with ?

driis
  • 161,458
  • 45
  • 265
  • 341
  • 2
    If performance is critical then RegEx should not be the first choice. They don't perform nearly as well as simple string operations such as Split and trim – Rune FS May 03 '10 at 19:45
2

Depending on your complete grammar you have a few options: if it's a very simple grammar and you don't have to test for errors in it you could simply go with the below (which will be fast)

var input = "(params (abc 1.3)(sdc 2.0)(www 3.05)....)";
var tokens = input.Split('(');
var typeName = tokens[0];
//you'll need more than the type name (assembly/namespace) so I'll leave that to you
Type t = getStructFromType(typeName);
var obj = TypeDescriptor.CreateInstance(null, t, null, null);
for(var i = 1;i<tokens.Length;i++)
{
    var innerTokens = tokens[i].Trim(' ', ')').Split(' ');
    var fieldName = innerTokens[0];
    var value = Convert.ToDouble(innerTokens[1]);
    var field = t.GetField(fieldName);
    field.SetValue(obj, value);
}

that simple approach however requires a well conforming string or it will misbehave.

If the grammar is a bit more complicated e.g. nested ( ) then that simple approach won't work.

you could try to use a regEx but that still requires a rather simple grammar so if you end up having a complex grammar your best choice is a real parser. Irony is easy to use since you can write it all in simple c# (some knowledge of BNF is a plus though).

Rune FS
  • 21,497
  • 7
  • 62
  • 96
2

A regex can do the job for you:

public Dictionary<string, double> ParseString(string input){
    var dict = new Dictionary<string, double>();
    try
    {
        var re = new Regex(@"(?:\(params\s)?(?:\((?<n>[^\s]+)\s(?<v>[^\)]+)\))");
        foreach (Match m in re.Matches(input))
            dict.Add(m.Groups["n"].Value, double.Parse(m.Groups["v"].Value));
    }
    catch
    {
        throw new Exception("Invalid format!");
    }
    return dict;
}

use it like:

string str = "(params (abc 1.3)(sdc 2.0)(www 3.05))";
var parsed = ParseString(str);

// parsed["abc"] would now return 1.3

That might fit better than creating a lot of different structs for every possible input string, and using reflection for filling them. I dont think that is worth the effort.

Furthermore I assumed the input string is always in exactly the format you posted.

Philip Daubmeier
  • 14,584
  • 5
  • 41
  • 77
1

You might consider performing just enough string manipulation to make the input look like standard command line arguments then use an off-the-shelf command line argument parser like NDesk.Options to populate the Params object. You give up some efficiency but you make it up in maintainability.

public Params Parse(string input)
{
    var @params = new Params();
    var argv = ConvertToArgv(input);
    new NDesk.Options.OptionSet
        {
            {"abc=", v => Double.TryParse(v, out @params.abc)},
            {"sdc=", v => Double.TryParse(v, out @params.sdc)},
            {"www=", v => Double.TryParse(v, out @params.www)}
        }
        .Parse(argv);

    return @params;
}

private string[] ConvertToArgv(string input)
{
    return input
        .Replace('(', '-')
        .Split(new[] {')', ' '});
}
Handcraftsman
  • 6,863
  • 2
  • 40
  • 33
0

Here's an out-of-the-box approach: convert () to {} and [SPACE] to ":", then use System.Web.Script.Serialization.JavaScriptSerializer.Deserialize

string s = "(params (abc 1.3)(sdc 2.0))"
  .Replace(" ", ":")
  .Replace("(", "{")
  .Replace(")","}"); 

return new System.Web.Script.Serialization.JavaScriptSerializer().Deserialize(s);
Earl
  • 1
  • Seems to me this would break all to hell if the params can ever contain spaces or parens... – cHao Sep 27 '12 at 15:55
0

Do you want to build a data representation of your defined syntax?

If you are looking for easily maintainability, without having to write long RegEx statements you could build your own Lexer parser. here is a prior discussion on SO with good links in the answers as well to help you

Poor man's "lexer" for C#

Community
  • 1
  • 1
Glennular
  • 17,827
  • 9
  • 58
  • 77
0

I would just do a basic recursive-descent parser. It may be more general than you want, but nothing else will be much faster.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135