3

I have a [textual] tree like this:

+---step-1
|   +---step_2
|   |   +---step3
|   |   \---step4
|   +---step_2.1
|   \---step_2.2
+---step1.2

Tree2

+---step-1
|   \---step_2
|   |   +---step3
|   |   \---step4
+---step1.2

This is just a small example, tree can be deeper and with more children and etc..

Right now I'm doing this:

for (int i = 0; i < cmdOutList.Count; i++)
{
    string s = cmdOutList[i];
    String value = Regex.Match(s, @"(?<=\---).*").Value;
    value = value.Replace("\r", "");
    if (s[1].ToString() == "-")
    {
        DirectoryNode p = new DirectoryNode { Name = value };
        //p.AddChild(f);
        directoryList.Add(p);
    }
    else
    {
        DirectoryNode f = new DirectoryNode { Name = value };
        directoryList[i - 1].AddChild(f);
        directoryList.Add(f);
    }
}

But this doesn't handle the "step_2.1" and "step_2.2"

I think I'm doing this totally wrong, maybe someone can help me out with this.

EDIT:

Here is the DirectoryNode class to make that a bit more clear..

public class DirectoryNode
{
    public DirectoryNode()
    {
        this.Children = new List<DirectoryNode>();
    }
    public DirectoryNode ParentObject { get; set; }
    public string Name;
    public List<DirectoryNode> Children { get; set; }

    public void AddChild(DirectoryNode child)
    {
        child.ParentObject = this;
        this.Children.Add(child);
    }
}
CDspace
  • 2,639
  • 18
  • 30
  • 36
value
  • 125
  • 2
  • 6

2 Answers2

3

If your text is that simple (just either +--- or \--- preceded by a series of |), then a regex might be more than you need (and what's tripping you up).

DirectoryNode currentParent = null;
DirectoryNode current = null;
int lastStartIndex = 0;

foreach(string temp in cmdOutList)
{
    string line = temp;

    int startIndex = Math.Max(line.IndexOf("+"), line.IndexOf(@"\");

    line = line.Substring(startIndex);

    if(startIndex > lastStartIndex) 
    {
        currentParent = current;
    }
    else if(startIndex < lastStartIndex)
    {
        for(int i = 0; i < (lastStartIndex - startIndex) / 4; i++)
        {
            if(currentParent == null) break;

            currentParent = currentParent.ParentObject;
        }
    }

    lastStartIndex = startIndex;

    current = new DirectoryNode() { Name = line.Substring(4) };

    if(currentParent != null)
    {
        currentParent.AddChild(current);
    }
    else
    {
        directoryList.Add(current);
    }
}
Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
  • OK, was testing it and it brakes when tree looks like on the tree2 then the step1.2 is still added as a step-1 child.. – value Feb 10 '11 at 19:58
  • @value: Tree-2 looks to be improperly formed (at least based on the rules I wrote this using). Shouldn't `+---step_2` be `\---step_2`, since it's the last node under its parent? – Adam Robinson Feb 10 '11 at 20:07
  • @Adam, yes sorry about that. was just a typo in the tree and has nothing to do with the problem. – value Feb 10 '11 at 20:28
  • @value: I see; See if the edit I just made takes care of the issue. I'm not in front of VS right now, so I can't test it myself, but it's a fairly minor change. – Adam Robinson Feb 10 '11 at 20:37
  • @Adam, the change you made breaks the code for the first tree example. It wont add step_2.1 and step_2.1 as childs of the step2. – value Feb 10 '11 at 21:00
  • @value According to the first tree, `step_2.1` and `step_2.2` are children of `step-1`, not `step_2`. – Adam Robinson Feb 10 '11 at 21:20
  • Ah, yes, i wanted to say step-1. Dont know whats wrong with me today – value Feb 10 '11 at 21:28
  • @value: If it doesn't add them in the right spot, what *does* it do? – Adam Robinson Feb 10 '11 at 21:29
  • It sets the currentParent to null so no ParentObject is added the step_2.1 or step_2.2 ... – value Feb 10 '11 at 21:33
  • @value: Then something is different between what you're testing with and what I'm writing. I've taken the code above (which actually didn't compile, but I've edited it so that it does) and it ran correctly with both trees. – Adam Robinson Feb 10 '11 at 21:42
  • Ah, ok. Now i see, first what i did to make it compile i changed the line in here to lines: lines = line.Substring(startIndex); And everywhere else in the code coming afthere that. So that was probably the mistake i made. – value Feb 10 '11 at 22:01
0

Regex definitely looks unnecessary here, since the symbols in your markup language (that's what it is, after all) are both static and few. That is: Although the label names may vary, the tokens you need to look for when trying to parse them into relevant pieces will never be anything other than +---, \---, and ..

From a question I answered yesterday: "Regexes are extremely useful for describing a whole class of needles in a largely unknown haystack, but they're not the right tool for input that's in a very static format."

String manipulation is what you want for parsing this, especially since you're dealing with a recursive markup language, which can't be fully understood by regex anyway. I'd also suggest creating a tree-type data structure to store the data (which, surprisingly, doesn't seem to be included in the framework unless they added it after 2.0).

As an aside, your regex above seems to have an unnecessary \ in it, but that doesn't matter in most regex flavors.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104