0

I have a code-like string like the following:

a
{
    bcd
    {
        ef
        {
            gh
            {
                i
            }
            j
        }
    }
    k
    {
        lmn
        {
            op
        }
        qr
        {
            st
        }
        uv
        {
            wx
        }
        y
    }
    z
}

I wish to parse this string such so that I can create a hierarchical array from this code where each tree gets created based on { and a tree ends at }.

The array would look like:

[
    "a",

    "{",

    [

        "bcd",

        "{",

        [
            "ef",

            "{",

            [
                "gh",

                "{",

                [
                    "i"
                ],

                "}",

                "j"
            ],

            "}"
        ],

        "}",

        "k",

        "{",

        [
            "lmn",
            "{",
            [
                "op"
            ],

            "}",
            "qr",

            "{",

            [
                "st"
            ],

            "}",

            "uv",

            "{",
            [
                "wx"
            ],

            "}",
            "y"
        ],

        "}",
        "z"
    ],

"}"
]

Can any one help me in getting an algo of this?

You can also pass me a code in any of these languages Java/C#/PHP/VB.NET/JavaScript/ActionScript.

svick
  • 236,525
  • 50
  • 385
  • 514
sudipto
  • 2,472
  • 1
  • 17
  • 21
  • Likewise compiler does to parse code!!! – Rahul Jul 02 '11 at 18:36
  • Because for regular arrays you need to specify how deep the array nesting will be at compile time, this cannot be done with regular array's. You would have to make nodes that hold an array of strings and an array of nodes. – MrFox Jul 02 '11 at 18:38
  • @MrFox: How about using PHP or ActionScript/JavaScript? – sudipto Jul 02 '11 at 18:39
  • @MrFox: Not true. If, for instance, all {} are replaced with (), he's looking at a regular Lisp list. For most other languages, commas will be necessary as well. Just put the file into the natural form of an array in whatever language you'd like to use, and make the langauge do the parsing. – jforberg Jul 02 '11 at 18:44
  • Check this post once; it has a example code in c++: http://stackoverflow.com/questions/391432/developing-a-simple-parser – Rahul Jul 02 '11 at 18:45
  • 3
    when somebody posts a whole task to do instead of just a problem and searches for a full solution to it instead of hints it makes me think: what the hell. Have you even came out with some code.. it's just a simple algorithm to write. it ain't that hard! you could even find some solutions with searching the net for a minute – ub1k Jul 02 '11 at 19:13
  • @ub1k: I have tried and found 2/3 algo. Just would like to verify, without disclosing if there is any other and what is the most optimized. Thanks for your comments, though, somehow, your helpful simple algo/code is missing. – sudipto Jul 02 '11 at 19:26
  • @sudimail, why "without disclosing"? – Bart Kiers Jul 02 '11 at 20:32
  • Do you really want the braces to also be items in the result list? – Svante Jul 02 '11 at 20:39
  • @Bart:1. Because that was in PHP 2. I would soon disclose once dispute over the question (which is shifting focus from the resolution to the validity..and shows less passion towards coding and more towards processes and rules..) – sudipto Jul 03 '11 at 17:12

2 Answers2

0

You are not really picky about the language, so I'm pretty curious why you want this. Here's some code that should do what you want in C#:

public static object[] ParseSpecial(string s)
{
    string dummy = "";
    Stack<List<object>> result = new Stack<List<object>>();
    result.Push(new List<object>());
    foreach (char character in s)
    {
        switch (character)
        {
            case '{':
                if (dummy.Length > 0)
                    result.Peek().Add(dummy);
                dummy = "";

                result.Peek().Add("{");
                result.Push(new List<object>());
                break;

            case '}':
                if (dummy.Length > 0)
                    result.Peek().Add(dummy);
                dummy = "";

                List<object> temp = result.Pop();
                result.Peek().Add(temp.ToArray());
                result.Peek().Add("}");
                break;

            default:
                dummy += character;
                break;
        }
    }

    if (dummy.Length > 0)
        result.Peek().Add(dummy);

    return result.Peek().ToArray();
}
Svante
  • 50,694
  • 11
  • 78
  • 122
Jan-Peter Vos
  • 3,157
  • 1
  • 18
  • 21
  • I have not chosen a definite language just to get response from developers in diverse languages. Thanks for your code. – sudipto Jul 02 '11 at 19:11
  • Your code doesn't handle the case `a { b c }` as one item with two children, which I think it should. Also, the text nodes contain leading and trailing white-space, which I think they shouldn't. – svick Jul 02 '11 at 19:46
0

If this is all you want to do you could write it like this (C#):

class Node
{
    public string Name { get; private set; }
    public IEnumerable<Node> Children { get; private set; }

    public Node(string name, IEnumerable<Node> children)
    {
        Name = name;
        Children = children;
    }
}

class Parser
{
    public Node Parse(string s)
    {
        return Parse(s.Split(new char[0], StringSplitOptions.RemoveEmptyEntries));
    }

    public Node Parse(IEnumerable<string> tokens)
    {
        using (var enumerator = tokens.GetEnumerator())
        {
            enumerator.MoveNext(); // move to first token
            return Parse(enumerator);
        }
    }

    Node Parse(IEnumerator<string> enumerator)
    {
        string name = enumerator.Current;
        enumerator.MoveNext();
        var children = new List<Node>();
        if (enumerator.Current == "{")
        {
            enumerator.MoveNext();
            while (enumerator.Current != "}")
            {
                children.Add(Parse(enumerator));
            }
            enumerator.MoveNext();
        }
        return new Node(name, children);
    }
}

This code doesn't check the return value of MoveNext(), which means it would produce weird results on invalid inputs (including the possibility of an infinite loop).

It also requires that the tokens are delimited by whitespace. So a string like a{b{c}} would be parsed as one node with the name a{b{c}}.

Creating a special type Node is much better than using weakly-typed array (even if you use weakly-typed language). And there is no need to include the braces in the result.

If you want to do something more complicated, I would recommend using some parser library.

If the string might be very long and you're reading it from a file or network, you might want to use some form of stream instead of ordinary string.

svick
  • 236,525
  • 50
  • 385
  • 514