4

I ran into this question earlier today:

Example Input: I ran into Joe and Jill and then we went shopping
Example Output: [TOP [S [S [NP [PRP I]] [VP [VBD ran] [PP [IN into] [NP [NNP Joe] [CC and] [NNP Jill]]]]] [CC and] [S [ADVP [RB then]] [NP [PRP we]] [VP [VBD went] [NP [NN shopping]]]]]]

enter image description here

I was about to suggest simply parsing the expected output (as it looks like an s-expression) into an object (in our case a tree) and then using simple LINQ methods to process it. However, to my surprise, I was unable to find a C# s-expression parser.

The only thing I could think of is using Clojure to parse it since it compiles to the clr, I'm not sure it's a good solution though.

By the way, I don't mind the answer to output of type dynamic. Only answers I've found here were for deserializing into a specific schema.

To sum up my question: I need to deserialize s-expressions in C# (serialization would be nice for future readers of this question)

Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • Danny, thank you for the edit (though I'm not sure why the image is relevant I'll trust that since you have more experience). I see in your description you know LISP and .NET and I would love your advice. – Benjamin Gruenbaum Feb 03 '13 at 17:54
  • Do you mean (de)serialize expressions already in the form of *[TOP [S [S [NP [PRP I]] [VP [VBD ran] [PP [IN into] [NP [NNP Joe] [CC and] [NNP Jill]]]]] [CC and] [S [ADVP [RB then]] [NP [PRP we]] [VP [VBD went] [NP [NN shopping]]]]]]* or are you refering to the input expression? – Danny Varod Feb 03 '13 at 17:56
  • I would like to be able to (de)serialize s-expressions in general, in this case yes, I want to be able to deserialize the expression mentioned above (well, substituting ( for [ and ) for ]) – Benjamin Gruenbaum Feb 03 '13 at 17:57

2 Answers2

7

It looks like you need a data-structure of the form:

public class SNode
{
    public String Name { get; set; }

    private readonly List<SNode> _Nodes = new List<SNode>();
    public ICollection<SNode> Nodes { get { return _Nodes; } }
}

A serializer of the form

public String Serialize(SNode root)
{
    var sb = new StringBuilder();
    Serialize(root, sb);
    return sb.ToString();
}

private void Serialize(SNode node, StringBuilder sb)
{
    sb.Append('(');

    sb.Append(node.Name);

    foreach (var item in node.Nodes)
        Serialize(item, sb);

    sb.Append(" )");
}

And a de-serializer of the form:

public SNode Deserialize(String st)
{
    if (String.IsNullOrWhiteSpace(st))
        return null;

    var node = new SNode();

    var nodesPos = String.IndexOf('(');
    var endPos = String.LastIndexOf(')');

    var childrenString = st.SubString(nodesPos, endPos - nodesPos);

    node.Name = st.SubString(1, (nodesPos >= 0 ? nodePos : endPos)).TrimEnd();

    var childStrings = new List<string>();

    int brackets = 0;
    int startPos = nodesPos;
    for (int pos = nodesPos; pos++; pos < endPos)
    {
        if (st[pos] == '(')
            brackets++;
        else if (st[pos] == ')')
        {
            brackets--;

            if (brackets == 0)
            {
                childStrings.Add(st.SubString(startPos, pos - startPos + 1));
                startPos = pos + 1;
            }
        }
    }

    foreach (var child in childStrings)
    {
        var childNode = Deserialize(this, child);
        if (childNode != null)
            node.Nodes.Add(childNode);
    }

    return node;
}

If haven't tested or even compiled this code, however, this is more or less how it could work.

Danny Varod
  • 17,324
  • 5
  • 69
  • 111
  • +1 Thank you :) What an awesome thing to do of you. I'll look into this code and read it thoroughly tomorrow. I'm just disappointed there is no more canonical way to do this. You should put this code in github and nuget so other people will be able to enjoy this :) – Benjamin Gruenbaum Feb 03 '13 at 18:32
  • As I've wrote, I haven't even compiled it, so it WILL need debugging. I have yet to posted any code in GitHub or the sort, I always mean to get to that. I'm not sure how many people would find this useful though. – Danny Varod Feb 03 '13 at 18:36
  • I would. Also, I trust this answer means that there is no widely used library that parses S-Expressions in C#. I think they would make an intersting data exchange format. – Benjamin Gruenbaum Feb 03 '13 at 18:43
  • I didn't search for libraries, however, if there was one I assume you would have found it. – Danny Varod Feb 03 '13 at 20:44
  • someone actually compiled this code? I tried it and it doesn't work even if purged from syntax orrors.... – weirdgyn Feb 25 '17 at 17:57
  • @weirdgyn I did not compile this code, it was meant as a pseudo solution. The OP, however, stated that he got this working, so ask him to edit. – Danny Varod Feb 26 '17 at 13:07
2

I wrote an open source S-Expression parser that is available as S-Expression.NET. Since it uses OMeta# to generate the parser you can quickly play with it to add new features.

sw.
  • 3,240
  • 2
  • 33
  • 43
  • do you know how to modify .ometacs to support symbols including uderscore characters and dots (also for numbers) ? – weirdgyn Feb 25 '17 at 17:59