0

I'm building up a tree to parse some text, so I've created some nodes:

abstract class Node { }

class TextNode : Node
{
    public readonly string Text;
    public TextNode(string text)
    {
        Text = text;
    }
    public TextNode(char ch)
    {
        Text = ch.ToString();
    }
}

class VariableNode : Node
{
    public readonly string Name;
    public VariableNode(string name)
    {
        Name = name;
    }
}

I'm going to add some "branching" nodes pretty soon (nodes that contain other nodes).

I'm wondering about the best way to process them. Right now I've got some code that looks like this:

foreach (var item in nodes)
{
    if (item is TextNode)
        sb.Append(((TextNode)item).Text);
    if (item is VariableNode)
        sb.Append(dict[((VariableNode)item).Name]);
}

Which seems a bit clunky, and will only get worse as I add to it.

Should I

  1. Try to encapsulate the logic into the base Node class somehow so that I can just the DoStuff() function without worrying about type-casting?
  2. Should I add some kind of enum for each node type so that I can use a switch instead of a foreach loop?
  3. Something else?

Need to give you guys a bit more context. In the example above, variables nodes need access to the dict to render themselves.

I'm parsing a template. In the template, there are variables. The variable names and their values are stored in the dict. At parse time, I don't have the values (well, I could, but I want to be able to re-render the template with different values), so I need to be able to pass in different dicts to the template. The way I have it set up, the template does all the rendering, the nodes do nothing. I guess I could pass the dict one more level down to each node so they can render themselves...


To elaborate on #1

// in class Template
public string Render(Dictionary<string, object> dict)
{
    var sb = new StringBuilder();

    foreach (var item in nodes)
        sb.Append(item.Render(dict));

    return sb.ToString();
}

interface INode {
    string Render(Dictionary<string, object> dict);
}

class TextNode : INode
{
    private string _text;

    public TextNode(string text)
    {
        _text = text;
    }
    public TextNode(char ch)
    {
        _text = ch.ToString();
    }

    public string Render(Dictionary<string, object> dict)
    {
        return _text;
    }
}

class VariableNode : INode
{
    private string _name;

    // TODO: filters...

    public VariableNode(string name)
    {
        _name = name;
    }

    public string Render(Dictionary<string, object> dict)
    {
        return dict[_name].ToString();
    }
}

I guess that isn't such a bad solution either.


Solutions so far:

  1. Use an enum and switch
  2. Build a dictionary of type/actions to make the switch a bit cleaner
  3. Polymorphism (x2)
  4. Visitor pattern (haven't read this yet)
  5. Use ANTLR -- although ANTLR will build a tree and then this problem applies again
mpen
  • 272,448
  • 266
  • 850
  • 1,236

5 Answers5

2

You can instead use an interface and define a enum property, for eg. NodeType Type where Type is an enum with values TextNode, VariableNode. Although, here also you will need to cast the node to get the value. You can try to add a property named Value which returns Text or Name depending on the NodeType. If dict is accessible from VariableNode, Value can return the exact value needed for your method.

All in all, I think interfaces suit your requirements better than an abstract class.

Yogesh
  • 14,498
  • 6
  • 44
  • 69
  • So, #2 but with an interface instead of abstract class. I realize the TextNode and VariableNode classes look very similar at the moment (and could be collapsed), but I've differentiated them mostly because they need to be processed differently. "dict" is not available from within the VariableNode. In fact, it's not available until... well, later. After the parsing is complete. – mpen Jan 12 '11 at 08:39
  • +1; only have different node classes if there is a fundamental reason to do so (you wouldn't normally need to have different classes for Ford and Toyota cars - they would both just be a Car). Also using an enum for node type means you can shift from having an ugly (and slow) `if` / `else if` with casting to having a nice `switch`. – slugster Jan 12 '11 at 09:09
  • You can designate common methods to differentiate between the nodes for all the work just like the property `Value` i mentioned which returns `Text` or `Name` accordingly. The more different the nodes are, the more suited interfaces become as all the logic of dealing with the nodes will go into the node classes. – Yogesh Jan 12 '11 at 09:11
2

If you're going to have lots of different node types with different behaviours I would certainly consider using a base class Node and then implementing a specialized node class for each type that implements the specific behaviour.

An alternative to putting tree processing logic into the nodes is to use the visitor pattern. In this case you would tend to have a more homogeneous tree structure with the specific processing rules maintained outside the tree.

If this tree is for the purpose of parsing text have you looked at Antlr? This tool can generate your syntax tree for you, based on a specified grammar.

James Gaunt
  • 14,631
  • 2
  • 39
  • 57
  • I've looked into ANTLR... haven't had much luck with it. Not overly fond of it. I'll have a look through that Visitor pattern article, thanks. – mpen Jan 12 '11 at 08:35
  • It's true ANTLR has a bit of a learning curve, and of course if you want to hand craft a parser you can always create something more streamlined. But if you're likely to want to change or develop the syntax overtime it's a great way to eliminate the risk of bugs in the parsing code. Good luck. – James Gaunt Jan 12 '11 at 11:03
2

Move your foreach(var item in nodes) into a method (i.e. BuildAll) in the abstract node class, create an abstract / virtual method Build (in the abstract Node class) and call the method Build() in the BuildAll and let it output a string...

Like so:

    public abstract string Build();

    public string BuildAll() {
      var output = new StringBuilder();
      foreach(var node in nodes) {
        output.append(node.Build()); 
      }
      return output.toString();
    }

and override it inside each actual implementation of node... so TextNode will have...

public override string Build() {
  return ... extract whatever you want from TextNode;
}

In this case, your base class does not need to know the exact implementation of its descendant, thus does not break the Open Close Principle.

Jimmy Chandra
  • 6,472
  • 4
  • 26
  • 38
  • I think it's called Template Method pattern... see http://en.wikipedia.org/wiki/Template_method_pattern – Jimmy Chandra Jan 12 '11 at 08:38
  • i.e. #1. Problem is, the nodes don't have enough information to write themselves until some more information is collected later. I could pass that in to the "build all" function, but different nodes might require different info. – mpen Jan 12 '11 at 09:37
  • Looks like you found a way to pass the variables :) – Jimmy Chandra Jan 12 '11 at 14:24
1

Polymorphism is exactly what you want. Something like this:

class Node
{
   public virtual string ToDisplayText(params object[] parameters)
   {
      return string.Empty;
   }
}
class TextNode : Node
{
   public override string ToDisplayText(params object[] parameters)
   {
      return this.Text;
   }
}
class VariableNode : Node
{
   public override string ToDisplayText(params object[] parameters)
   {
      //check parameters
      var dict = (Dictionary<string,string>)parameters[0];
      return dict[this.Name];
   }
}

So you can:

foreach(var node in nodes)
{
   sb.Append(node.ToDisplayText(dict));
}
Cheng Chen
  • 42,509
  • 16
  • 113
  • 174
  • Curious way to go about it. I guess that's flexible, but we might as well make Node an interface and be explicit about what params we're taking. I don't think having a generic `params object[] parameters` helps any because they're all going to be passed the same thing anyway if I'm using polymorphism -- only way I could pass them different params if I check their types, but then we're back to square 1. – mpen Jan 12 '11 at 09:50
1

You can create a Dictionary<Type,Action<T>> - the keys are the node types and the Action<T> delegates are what you do with each.

You can than simply do:

Dictionary[typeof(variable)];

In order to execute the right action for each type.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Ah... this is actually what I'm doing (see http://stackoverflow.com/questions/4632945) but I didn't know if it was the "standard" way of processing nodes. Seems a bit... dirty. – mpen Jan 12 '11 at 08:33
  • @Ralph - This is a good way to keep your code clean. A lookup table and simple way to invoke the required functionality. – Oded Jan 12 '11 at 08:35