How do I efficiently search this hierarchical structure?

Question

I have a data structure that looks like this:

public class Node
{
     public string Code { get; set; }
     public string Description { get; set; }
     ...
     public List<Node> Children { get; set; }
}

I want to write a method that will return a specific node, given the specified Code. Normally I would just do a recursive walk through the hierarchy to find the node, but I'm concerned about performance. There will be several thousand nodes in the hierarchy, and this method will be called many, many times.

How do I structure this to make it faster? Can I use an existing data structure that perhaps performs a binary search on Code while retaining the hierarchical structure, without re-implementing some form of binary search myself?

Is the data organized / sorted in any way? If there is a pattern or rule to the Code that children have, you can use that to reduce your search space. — Jeremy Elbourn, Nov 02 '10 at 16:15
@Abe: No, it's in memory. It's been serialized out to an XML file, so that I can retrieve it when the program starts. — Robert Harvey, Nov 02 '10 at 16:15
@jelbourn: The code has a maximum size of about 16 characters. Other than that, there's no pattern; the code is an arbitrary string. The data is only organized hierarchically. — Robert Harvey, Nov 02 '10 at 16:16
Keep the nodes in a dictionary where the key is code? Then you get an O(1) lookup time — BeRecursive, Nov 02 '10 at 16:16
@BeRecursive" How do I associate the dictionary items to my items? By index[]? — Robert Harvey, Nov 02 '10 at 16:17
You store them in a single Dictionary and the dictionary is of type . Then the dictionary contains the actual node objects themselves. When you finish serializing the node objects from xml just add them to this master dictionary set — BeRecursive, Nov 02 '10 at 16:20
I don't understand why the search must be recursive. As stated, the question seems to only be about searching nodes for the code. If you must have the parent of the node then it helps to say so explicitly. I have written an article about searching hierarchies iteratively. — Sam Hobbs, Jun 17 '19 at 19:18
@user34660: It doesn't. But implementing the search recursively is far more elegant than trying to maintain your own separate stack. — Robert Harvey, Jun 17 '19 at 19:31

score 13 · Accepted Answer · edited Nov 02 '10 at 21:15

13

Add all the nodes to dictionary with the code as key. (you can do it once), the look-up in dictionary is basically O(1).

void FillDictionary(Dictionary<string, Node> dictionary, Node node)
{
  if (dictionary.ContainsKey(node.Code))
    return;

  dictionary.Add(node.Code, node);

  foreach (Node child in node.Children)
    FillDictionary(dictionary, child)
}

If you know the root, usage will be:

var dictionary = new Dictionary<string, Node>();
FillDictionary(dictionary, rootNode);

If you don't you can call the FillDictionary() method on all your nodes with the same dictionary.

edited Nov 02 '10 at 21:15

Robert Harvey

178,213
47
333
501

answered Nov 02 '10 at 16:18

Itay Karo

17,924
4
40
58

How do I associate the dictionary items to my items? By index[]? – Robert Harvey Nov 02 '10 at 16:18
You can add them once to the dictionary with recursion after deserialization from XML. – Itay Karo Nov 02 '10 at 16:20
1

You don't need `Equals` and `GetHashCode`. – SLaks Nov 02 '10 at 16:21
@Itay: Yes, but *how does the dictionary know where the node is in the original collection?* – Robert Harvey Nov 02 '10 at 16:21
@SLaks: you right, Since we are keying by `Code` it redundant. – Itay Karo Nov 02 '10 at 16:22
Something like this: `void recurse(Node x, Dictionary y) {y[x.Code]=x;foreach(Node c in x.Children){recurse(c, y);}}` – Brian Nov 02 '10 at 16:22
2

@Robert: It does not - unless you add `ParentNode` to each `Node`. – Itay Karo Nov 02 '10 at 16:24
@Itay: Or use a `Tuple` as a value in the dictionary (with the 2nd `Node` being a parent node), if he is not permitted to modify the `Node` class. – Brian Nov 02 '10 at 16:27
@Itay: So basically you're saying that I need to load the data into a flat list, implement the hierarchical structure using pointers, and forget about nesting the children within each node? – Robert Harvey Nov 02 '10 at 16:30
@Brian: Indeed - But then in addition there is also a need to pass an `IEqualityComparer` to the `Dictionary` constructor – Itay Karo Nov 02 '10 at 16:31
@Brian - with the `Tuple`, he would get the immediate parent only. If he wants to go all the way up the tree, then he would have to: 1) find the code in the dictionary, 2) get the parent from the tuple. Repeat until parent is null. That would no longer be O(1). Assuming a perfectly balanced tree and statistically random nodes are picked, it would be something like O(levels / 2). – Nelson Rothermel Nov 02 '10 at 16:31
@Robert-if you load the data into a flat list you can also easily stick it in a database table which may be better than an XML file you write out, depending on your scenario. – Nelson Rothermel Nov 02 '10 at 16:35
@Robert: forget about nesting the children? is there no need for you to go down the hierarchy? – Itay Karo Nov 02 '10 at 16:37
@Itay: In a perfect world, when I get the node, I also get all of its embedded children, and all of their descendants. – Robert Harvey Nov 02 '10 at 16:39
@Robert: That is what I meant, I was wondering why you suggested to forget about the nested children.. – Itay Karo Nov 02 '10 at 16:43
@Robert - If you only need embedded children and descendants, then you don't need to worry about the parents? Your dictionary then becomes a lookup table/index. – Nelson Rothermel Nov 02 '10 at 16:43
@Itay: OK. A number of people have suggested that I put `ParentNode` into the data structure. How does that help me find a particular node quickly? More pointedly, *how does that keep me from walking the entire tree to find a particular node?* – Robert Harvey Nov 02 '10 at 16:45
@Nelson: It wouldn't be O(1) even if he instead used a list of elements, since the act of reading the list would also be O(levels). However, as long as traveling up or down the tree by one node is O(1), I think it is reasonable. – Brian Nov 02 '10 at 16:45
@Robert: The only thing adding `ParentNode` accomplishes is that it tells you *where* in the tree the node you found is, because if you travel up the tree you can say ("The node I found is the child of the child of the child of root"). We're assuming that is what you mean by wanting to know where in the tree the node you found is. – Brian Nov 02 '10 at 16:46
@Brian: No, I don't care about that. Because all of the children (and all of their children) are already in the node, that's all I need. What I meant by the Dictionary knowing where the node is, is that it needs to tell me where the node is in my original collection, so I can retrieve it. – Robert Harvey Nov 02 '10 at 16:47
@Robert: If all the nodes were added to dictionary with the `Code` as key. Is query the dictionary by the code and you will get you node within `O(1)` time. Take a look at the `Dictionary.TryGetValue(..)` method. – Itay Karo Nov 02 '10 at 16:48
@Rober: Adding a `ParentNode` doesn't do anything by itself. If you have a dictionary to look up, that will get you the node you want without traversing the whole tree. Once you have the correct node, `ParentNode` will let you know where in the tree you are. – Nelson Rothermel Nov 02 '10 at 16:48
@Itay: You're forgetting that this is a hierarchical data structure. – Robert Harvey Nov 02 '10 at 16:48
@Robert: Then what problem do you have with `Dictionary`? Any `Node` found by reading such a `Dictionary` will still know where its children are. – Brian Nov 02 '10 at 16:49
@Robert: Oh, you don't have to worry about "retrieving" it. You are dealing with reference types, so your `Dictionary` will have the same reference/object that the tree has. That will automatically give you the children you need. – Nelson Rothermel Nov 02 '10 at 16:50
@Robert: So? I guess that `Code` uniquely identify the `Node` so you can build this dictionary recursively and the recursion halt condition will be `if dictionary.ContainsKey(code)` – Itay Karo Nov 02 '10 at 16:51
@Brian: It would be a dictionary of nested dictionaries of nested dictionaries. The top level dictionary would have five nodes in it. I would still have to walk the entire tree to find the right node, wouldn't I? – Robert Harvey Nov 02 '10 at 16:51
@Robert - I will add the Dictionary creation code in my answer. – Itay Karo Nov 02 '10 at 16:52
@Robert: The dictionary wouldn't be nested, it would be a flat index. You build it once when your tree is built. I suppose a question is, do you load the XML tree once then do many searches on it? If so, load the XML tree, build your dictionary, then perform your searches. If you have to load the XML tree for every search (maybe each website hit?), then building the dictionary and searching once won't help speed it up. – Nelson Rothermel Nov 02 '10 at 16:55
@Nelson: I load the XML tree once. I get that the dictionary is a flat index, I just don't understand yet how that index is going to reach into my hierarchical data structure and retrieve the correct node. – Robert Harvey Nov 02 '10 at 16:57
@Robert: I added the code to my answer, GTG, Have a nice day everyone. – Itay Karo Nov 02 '10 at 17:00
@Itay: It works perfectly. Thanks for the help. The `ParentNode` was not needed. – Robert Harvey Nov 02 '10 at 18:21

Nelson Rothermel · Answer 2 · 2010-11-02T18:46:01.897

Here's a full implementation of what I and others were talking about. Note that by having the index dictionary, you will use a bit more memory (only references to the nodes) in exchange for faster searches. I'm using events to dynamically update the index.

Edit: Added comments and fixed up a few things.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Reflection;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create the root node for the tree
            MyNode rootNode = new MyNode { Code = "abc", Description = "abc node" };

            // Instantiate a new tree with the given root node.  string is the index key type, "Code" is the index property name
            var tree = new IndexedTree<MyNode, string>("Code", rootNode);

            // Add a child to the root node
            tree.Root.AddChild(new MyNode { Code = "def", Description = "def node" });

            MyNode newNode = new MyNode { Code = "foo", Description = "foo node" };

            // Add a child to the first child of root
            tree.Root.Children[0].AddChild(newNode);

            // Add a child to the previously added node
            newNode.AddChild(new MyNode { Code = "bar", Description = "bar node" });


            // Show the full tree
            Console.WriteLine("Root node tree:");
            PrintNodeTree(tree.Root, 0);

            Console.WriteLine();

            // Find the second level node
            MyNode foundNode = tree.FindNode("def");

            if (foundNode != null)
            {
                // Show the tree starting from the found node
                Console.WriteLine("Found node tree:");
                PrintNodeTree(foundNode, 0);
            }

            // Remove the last child
            foundNode = tree.FindNode("foo");
            TreeNodeBase nodeToRemove = foundNode.Children[0];
            foundNode.RemoveChild(nodeToRemove);

            // Add a child to this removed node.  The tree index is not updated.
            nodeToRemove.AddChild(new MyNode { Code = "blah", Description = "blah node" });

            Console.ReadLine();
        }

        static void PrintNodeTree(MyNode node, int level)
        {
            Console.WriteLine(new String(' ', level * 2) + "[" + node.Code + "] " + node.Description);

            foreach (MyNode child in node.Children)
            {
                // Recurse through each child
                PrintNodeTree(child, ++level);
            }
        }
    }

    public class NodeEventArgs : EventArgs
    {
        public TreeNodeBase Node { get; private set; }

        public bool Cancel { get; set; }

        public NodeEventArgs(TreeNodeBase node)
        {
            this.Node = node;
        }
    }

    /// <summary>
    /// The base node class that handles the hierarchy
    /// </summary>
    public abstract class TreeNodeBase
    {
        /// <summary>
        /// A read-only list of children so that you are forced to use the proper methods
        /// </summary>
        public ReadOnlyCollection<TreeNodeBase> Children
        {
            get { return new ReadOnlyCollection<TreeNodeBase>(this.children); }
        }

        private IList<TreeNodeBase> children;

        /// <summary>
        /// Default constructor
        /// </summary>
        public TreeNodeBase()
            : this(new List<TreeNodeBase>())
        {
        }

        /// <summary>
        /// Constructor that populates children
        /// </summary>
        /// <param name="children">A list of children</param>
        public TreeNodeBase(IList<TreeNodeBase> children)
        {
            if (children == null)
            {
                throw new ArgumentNullException("children");
            }

            this.children = children;
        }

        public event EventHandler<NodeEventArgs> ChildAdding;

        public event EventHandler<NodeEventArgs> ChildRemoving;

        protected virtual void OnChildAdding(NodeEventArgs e)
        {
            if (this.ChildAdding != null)
            {
                this.ChildAdding(this, e);
            }
        }

        protected virtual void OnChildRemoving(NodeEventArgs e)
        {
            if (this.ChildRemoving != null)
            {
                this.ChildRemoving(this, e);
            }
        }

        /// <summary>
        /// Adds a child node to the current node
        /// </summary>
        /// <param name="child">The child to add.</param>
        /// <returns>The child node, if it was added.  Useful for chaining methods.</returns>
        public TreeNodeBase AddChild(TreeNodeBase child)
        {
            NodeEventArgs eventArgs = new NodeEventArgs(child);
            this.OnChildAdding(eventArgs);

            if (eventArgs.Cancel)
            {
                return null;
            }

            this.children.Add(child);
            return child;
        }

        /// <summary>
        /// Removes the specified child in the current node
        /// </summary>
        /// <param name="child">The child to remove.</param>
        public void RemoveChild(TreeNodeBase child)
        {
            NodeEventArgs eventArgs = new NodeEventArgs(child);
            this.OnChildRemoving(eventArgs);

            if (eventArgs.Cancel)
            {
                return;
            }

            this.children.Remove(child);
        }
    }

    /// <summary>
    /// Your custom node with custom properties.  The base node class handles the tree structure.
    /// </summary>
    public class MyNode : TreeNodeBase
    {
        public string Code { get; set; }
        public string Description { get; set; }
    }

    /// <summary>
    /// The main tree class
    /// </summary>
    /// <typeparam name="TNode">The node type.</typeparam>
    /// <typeparam name="TIndexKey">The type of the index key.</typeparam>
    public class IndexedTree<TNode, TIndexKey> where TNode : TreeNodeBase, new()
    {
        public TNode Root { get; private set; }

        public Dictionary<TIndexKey, TreeNodeBase> Index { get; private set; }

        public string IndexProperty { get; private set; }

        public IndexedTree(string indexProperty, TNode rootNode)
        {
            // Make sure we have a valid indexProperty
            if (String.IsNullOrEmpty(indexProperty))
            {
                throw new ArgumentNullException("indexProperty");
            }

            Type nodeType = rootNode.GetType();
            PropertyInfo property = nodeType.GetProperty(indexProperty);

            if (property == null)
            {
                throw new ArgumentException("The [" + indexProperty + "] property does not exist in [" + nodeType.FullName + "].", "indexProperty");
            }

            // Make sure we have a valid root node
            if (rootNode == null)
            {
                throw new ArgumentNullException("rootNode");
            }

            // Set the index properties
            this.IndexProperty = indexProperty;
            this.Index = new Dictionary<TIndexKey, TreeNodeBase>();

            // Add the root node
            this.Root = rootNode;

            // Notify that we have added the root node
            this.ChildAdding(this, new NodeEventArgs(Root));
        }

        /// <summary>
        /// Find a node with the specified key value
        /// </summary>
        /// <param name="key">The node key value</param>
        /// <returns>A TNode if found, otherwise null</returns>
        public TNode FindNode(TIndexKey key)
        {
            if (Index.ContainsKey(key))
            {
                return (TNode)Index[key];
            }

            return null;
        }

        private void ChildAdding(object sender, NodeEventArgs e)
        {
            if (e.Node.Children.Count > 0)
            {
                throw new InvalidOperationException("You can only add nodes that don't have children");
                // Alternatively, you could recursively get the children, set up the added/removed events and add to the index
            }

            // Add to the index.  Add events as soon as possible to be notified when children change.
            e.Node.ChildAdding += new EventHandler<NodeEventArgs>(ChildAdding);
            e.Node.ChildRemoving += new EventHandler<NodeEventArgs>(ChildRemoving);
            Index.Add(this.GetNodeIndex(e.Node), e.Node);
        }

        private void ChildRemoving(object sender, NodeEventArgs e)
        {
            if (e.Node.Children.Count > 0)
            {
                throw new InvalidOperationException("You can only remove leaf nodes that don't have children");
                // Alternatively, you could recursively get the children, remove the events and remove from the index
            }

            // Remove from the index.  Remove events in case user code keeps reference to this node and later adds/removes children
            e.Node.ChildAdding -= new EventHandler<NodeEventArgs>(ChildAdding);
            e.Node.ChildRemoving -= new EventHandler<NodeEventArgs>(ChildRemoving);
            Index.Remove(this.GetNodeIndex(e.Node));
        }

        /// <summary>
        /// Gets the index key value for the given node
        /// </summary>
        /// <param name="node">The node</param>
        /// <returns>The index key value</returns>
        private TIndexKey GetNodeIndex(TreeNodeBase node)
        {
            TIndexKey key = (TIndexKey)node.GetType().GetProperty(this.IndexProperty).GetValue(node, null);
            if (key == null)
            {
                throw new ArgumentNullException("The node index property [" + this.IndexProperty + "] cannot be null", this.IndexProperty);
            }

            return key;
        }
    }
}

Thanks for the code, I'll chew on it. +1 for the effort. I feel bad now; the problem turned out to be simpler than I thought. — Robert Harvey, Nov 02 '10 at 18:18
@Robert: No problem, we all have that happen. I think the key was that the tree and dictionary both point to the same object. It doesn't matter how you get to the node, you always have the same node with the children and all. Also, a note on my code: I think instead of a `ReadOnlyCollection` and custom events you could use something like `ObservableCollection`, but I don't know enough about it to be sure. — Nelson Rothermel, Nov 02 '10 at 18:48

score 2 · Answer 3 · answered Nov 02 '10 at 16:21

2

Without any kind of comparison-based organization for Code, there's nothing you can do to prevent an O(n) walkthrough of the tree. However, if you build another data structure (either a Dictionary for O(1) access or List for O(log n) access) at the same time you're reading through your XML file to build the Node tree, you could build the additional structure to access quickly without much more overhead.

answered Nov 02 '10 at 16:21

Jeremy Elbourn

2,630
1
18
15

How does the dictionary know where the node is in the original collection? – Robert Harvey Nov 02 '10 at 16:23
@Robert - can't you keep a backlink to parent in the `Node` to facilitate location information once the `Node` instance is located? – Steve Townsend Nov 02 '10 at 16:26
Or stuff the backlink to the parent in the auxiliary data structure, though I'd rather stick it in the `Node`. – Brian Nov 02 '10 at 16:29

score 2 · Answer 4 · answered Nov 02 '10 at 16:24

Store them in a Dictionary, this affords you O(1) lookup time.

var dict = new Dictionary<string, Node>();
dict.Add(n.Code, n);

Assuming n is a hydrated Node object. Then to get a particular node item you can do

var node = dict["CODEKEY"];

which will return your particular Node according to the provided key. You can even check the node exists using :

if (d.ContainsKey("CODEKEY"))
{
   //Do something
}

In order to know where the node was in the original collection you would need to add some sort of pointer hierarchy to your Node class so that it had knowledge of the previous node

score 1 · Answer 5 · answered Nov 02 '10 at 16:15

1

If you can change the order of the nodes, you can make a Binary Search Tree.

answered Nov 02 '10 at 16:15

SLaks

868,454
176
1,908
1,964

From OP: `without re-implementing some form of binary search myself?` – Abe Miessler Nov 02 '10 at 16:18
Does C# not have some sort of binary tree in its libraries? Bit unimpressive if not – The Archetypal Paul Nov 02 '10 at 16:20
@Paul: No. @Abe: he didn't say tree. – SLaks Nov 02 '10 at 16:20
So he can search for items in a binary tree without doing a binary search? (i'm not being snarky, just curious) – Abe Miessler Nov 02 '10 at 16:27
@Abe - Of course you can search a binary tree without doing a binary search. You can walk through every item--there's nothing stopping you. In fact, if the binary tree is ordered by the Code and you want to search by Description, you can't do a binary search. Of course, not doing a binary search on a binary tree does defeat the purpose :) – Nelson Rothermel Nov 02 '10 at 16:41

score 1 · Answer 6 · edited May 23 '17 at 12:33

1

This SO answer references a library that you should be able to use?

edited May 23 '17 at 12:33

Community

1
1

answered Nov 02 '10 at 16:22

The Archetypal Paul

41,321
20
104
134

score 1 · Answer 7 · edited May 23 '17 at 12:09

I'll be honest; I'm having great difficulty understanding how Itay's suggestion doesn't make perfect sense.

Here is the requirement that you've stated:

I want to write a method that will return a specific node, given the specified Code.

So the Code is unique, I take it? Then there's nothing stopping you from putting all of your Node objects into a Dictionary<string, Node>.

In your comments to Itay's answer you say this:

I get that the dictionary is a flat index, I just don't understand yet how that index is going to reach into my hierarchical data structure and retrieve the correct node.

If you mean you don't understand how the dictionary is going to know where your Node is in the data structure, that's because it isn't. Does this matter? You haven't stated in your requirements that you want to know where the node is in the data structure; you only specified that you want to get the node. In order to do this, the dictionary only needs to know where the node is in memory, not in some completely separate data structure.

To provide a simplified example (and I apologize if I'm insulting your intelligence here, but bear with me as this may at least clarify the point for someone else), suppose you had a plain LinkedList<int> containing all unique integers. You then enumerate over this list and use it to construct a Dictionary<int, LinkedListNode<int>>, the idea being that you want to be able to quickly find a node based on its value.

Does the dictionary need to know where in the linked list each node is? Certainly not—only where it is in memory. Once you've found your node based on its value in O(1) using the dictionary, you can of course easily traverse the linked list forwards or backwards using the node itself, which happens to be aware (by design) of the linked list containing it.

It's the same with your hierarchical structure, only a bit more complex than a linked list. But the same principle applies.

I believe I understand it now that I see the code Itay posted. The dictionary just holds an indexed collection of object references to the original node objects. I got a bit derailed on the suggestions that I implement a `ParentNode`...Sorry about that. I'm implementing Itay's code now...I'll let you know how it goes. — Robert Harvey, Nov 02 '10 at 17:35

score 0 · Answer 8 · answered Nov 02 '10 at 16:23

0

Why not use SortedSet<Node> to build a BST containing all your Node instances? Comparator would be based on Code - container would have to be scoped such that this is unique across all members.

answered Nov 02 '10 at 16:23

Steve Townsend

53,498
9
91
140

How would I retain the hierarchical structure, and still get good search characteristics throught the entire tree? – Robert Harvey Nov 02 '10 at 16:25
@Robert - this is in addition to your parent-child structure as laid out. Can't you backlink from child to parent to facilitate tree walk from a located `Node`? – Steve Townsend Nov 02 '10 at 16:27

How do I efficiently search this hierarchical structure?

8 Answers8

Linked