Tree data structure with generics

Question

I'm banging my head on the desk for hours now, but it seems like I'm too stupid to implement a tree structure in C#.

There are 2 types of nodes, I call them Node and NodeCollection.
Both Node and NodeCollection can have a parent NodeCollection
A NodeCollection can have a collection of child nodes, which are either Node or NodeCollection
A Node cannot have any childs.
A Node or NodeCollection without parent is considered to be the root node
A Node has a value of any arbitary type, done with generics
- NodeCollection
  - NodeCollection
    - Node
    - Node
    - NodeCollection
      - Node
      - NodeCollection
        
        Node
        
        Node
      - NodeCollection
    - Node

Is there a collection type from the BCL that serves this purpose? What I have so far:

public abstract class NodeBase {

    protected NodeCollection Parent { get; set; }

}

public class Node<T> : NodeBase {

    public string Key { get; set; }
    public T Value { get; set; }

}

public class NodeCollection : NodeBase {

    public ICollection<NodeBase> Children { get; set; }

}

This solution 'works', however I cannot just walk down the tree as NodeBase doesn't offer any childreen. I have to validate the type to find out if the child node is a NodeCollection, but if it is not, I can't properly cast the Node because it might be of unknown type.

You haven't asked a question, or shown what you've tried so far, or described the problem you're facing. — Jon Skeet, Apr 30 '14 at 14:20

score 5 · Accepted Answer · answered Apr 30 '14 at 14:36

The easiest option is to just have one Node class (rather than having two types of nodes, to separate leaves from non-leaf nodes), and to have your leaf nodes just have an empty collection of children rather than no collection at all.

If it's important to have the two types of nodes for other reasons, then you'd want to have both of them implement an interface that defines a collection of children, to which the leaf nodes will return an empty collection.

Andy Brown · Answer 2 · 2014-04-30T14:59:59.253

0

For dealing with the Node not having a link to the collection, I use an abstract base class property called "HasChildren" to allow for checking, and ensure that the concrete Node implementation throws an exception if you forgot to check and tried accessing the children.

public abstract class NodeBase<T> {
  public abstract bool HasChildren { get; }
  public abstract ICollection<T> Children { get; set; }
}
public class NodeCollection<T> : NodeBase<T> {
  public override bool HasChildren { get { return true; } }
  public override ICollection<T> Children { get; set; }
}
public class Node<T> : NodeBase<T> {
  public override bool HasChildren { get { return false; } }
  public override ICollection<T> Children { get { throw new Exception("blah"); } set { throw new Exception("blah"); } }
}

I use an Add method concrete implementation from an abstract base class method to work out what to add, and how, for me this also takes care of collection sorting for adding items to NodeCollections, and enforces uniqueness if required.

edited Apr 30 '14 at 14:59

answered Apr 30 '14 at 14:37

Andy Brown

18,961
3
52
62

In what way does a linked list help? – Servy Apr 30 '14 at 14:38
As per the OP it is "a collection type from the BCL that serves this purpose", and it is very effectively used in [this implementation](http://stackoverflow.com/a/2012855/1945631) – Andy Brown Apr 30 '14 at 14:40
That's a BCL type, but *it doesn't serve this purpose*. It in now way helps him access all of the children of any particular node. Having `LinkedList` be the concrete implementation of the `ICollection` representing the children isn't really meaningful. It's no different than a `List`, except that it'll most likely perform worse in typical use cases. *Any* implementation of `ICollection` will do. – Servy Apr 30 '14 at 14:41
That's covered by having the base class implement both `Children` and `HasChildren`. The LinkedList can be used as the underlying storage, rather than rolling your own reinventions of Count, First, Last, AddX, Remove, Clear ... – Andy Brown Apr 30 '14 at 14:45
Well `HasChildren` isn't needed at all, one can simply have an empty `Children` collection. Yes, having that accessible from all node types is key, yet that's not in your answer, despite being the answer to the question. Everything about `LinkedList` is just offtopic to the question. He only needs an implementation of `ICollection` for the `Children` collection, and he's free to use whatever he wants, so long as it implements `ICollection`. There's no compelling reason to use `LinkedList` (and there are some compelling reasons not to). – Servy Apr 30 '14 at 14:49
Thanks for the comments, I thought having `Children` in the base class was implied alongside having `HasChildren`. I like `HasChildren` as it conveys intent instead of trying to iterate a null or empty collection. `LinkedList` removed from the answer to reduce confusion. – Andy Brown Apr 30 '14 at 15:01
One can check a collection to see if it has elements in just as much work as it would take to check a `HasChildren` property; even easier in many cases as you can simply `foreach` over the collection and if it's empty, it does nothing, no need to even check anything. You shouldn't have a `null` collection, or throw an exception when there are none. It should be empty. Also, in your example your `Node` and `NodeCollection` classes are effectively identical, except for the one undesirable behavior of the node's children throwing an exception. At that point why even have two classes? – Servy Apr 30 '14 at 15:03
Agreed, one can. "why even have two classes" - I didn't want to challenge what I had to assume was a fundamental design requirement by the OP. We use the `HasChildren` approach; it works well for us. However I would be equally happy with a guaranteed non-null collection as long as the tests were in place to ensure that never changed. – Andy Brown Apr 30 '14 at 15:25

Tree data structure with generics

2 Answers2