Searching All Nodes in a Tree Structure efficiently

Question

I am working with a tree type structure that consists of a single 'root'TreeNodeDefinition record which can contain amongst other things a list of other TreeNodeDefinition classes, each of these can then contain a further List and so on and so forth.

I basically want to be able to traverse ALL nodes within the tree structure and check if a condition is met per node then that definition record is added to a list. I have come up with a method to do this but I cant help but think there is a much more efficient way of doing it:

List<ITreeNodeDefinition> treeNodeDefinitions = new List<ITreeNodeDefinition>();

treeNodeDefinitions = treeFactory.SearchNodesByRelationshipId(treeNodeDefinition, relationshipId, new List<ITreeNodeDefinition>());

where the first parameter is my Root node definition record, the second parameter is what I need to compare each node by and lastly I am passing in an Empty list that I want to populate each time a node matches my check. The method is as follows:

    public List<ITreeNodeDefinition> SearchNodesByRelationshipId(ITreeNodeDefinition treeNodeDefinition, int? relationshipId, List<ITreeNodeDefinition> tndList)
    {
        if (treeNodeDefinition.RelationshipId == relationshipId)
        {
            tndList.Add(treeNodeDefinition);
        }

        if (treeNodeDefinition.Nodes.Count != 0)
        {
            foreach (ITreeNodeDefinition nodeDefinition in treeNodeDefinition.Nodes)
            {
                List<ITreeNodeDefinition> tempTable = this.SearchNodesByRelationshipId(nodeDefinition, relationshipId, tndList);
            }
        }
        return tndList;
    }

As you can see, the method is calling itself for each sub-node found in the treeNodeDefinition.Nodes list. This is returning to a tempTable that I never do anything with... This stinks of inefficiency to me. i was wondering if there is a more direct way of navigating this kind of structure... I sure I am just missing a trick.

Just had a thought of using an out parameter for the list I am populating...? will test that now... But any more clever suggestions still welcome! — CJH, Sep 17 '18 at 21:26
How deep is your tree structure? This recursion will blow the stack depending on the depth. — Parrish Husband, Sep 17 '18 at 21:30
dont use out. you will not be able to pass the running list into the nested calls — pm100, Sep 17 '18 at 21:31
you have not shown how your data is stored. You could arrange the data so that it can be efficiently searched this way. (Place all nodes in a list , you can then search easily ) — pm100, Sep 17 '18 at 21:32
The tree view doesn't really tend to get more than a few levels deep, however these tree views are user-created to some extent so it could technically get deep. I will still need to be able to make the above check though. — CJH, Sep 17 '18 at 21:32
Yeah, I started looking at out and decided against it... Could potentially look at the node list idea though...? — CJH, Sep 17 '18 at 21:33

Parrish Husband · Answer 1 · 2018-09-18T10:16:01.157

You could approach this with an explicit stack and avoid recursion altogether:

public static IEnumerable<ITreeNodeDefinition> DepthFirstSearch(ITreeNodeDefinition root, int? relationshipId)
{
    var stack = new Stack<ITreeNodeDefinition>();
    stack.Push(root);
    while(stack.Count > 0)
    {
        var current = stack.Pop();
        if (current.RelationshipId == relationshipId)
            yield return current;

        foreach(var node in current.Nodes)
            stack.Push(node);
    }
}

To add onto this though, it might be simpler and more flexible to not hard-code in the relationship id check at all, and just filter the results down afterward if you find you're commonly trying to traverse the tree structure:

var matches = treeFactory.Traverse(root)
                         .Where(t => t.RelationshipId == 5)
                         .ToList();

Expanding on this, by using a search predicate you could build this search functionality into the method the same way LINQ does, which you could implement like this:

public static IEnumerable<ITreeNodeDefinition> DepthFirstSearch(ITreeNodeDefinition root, Func<ITreeNodeDefinition, bool> predicate)
{
    var stack = new Stack<ITreeNodeDefinition>();
    stack.Push(root);
    while (stack.Count > 0)
    {
        var current = stack.Pop();
        if (predicate(current))
            yield return current;

        foreach (var node in current.Nodes)
            stack.Push(node);
    }
}

The benefit is you aren't hard-coding in one specific search case when traversing. Calling this with the RelationshipId search would be:

var matches = treeFactory.DepthFirstSearch(root, t => t.RelationshipId == 5)
                         .ToList();

For completeness, here's an example of breadth-first searching. Note the difference in Queue<T> vs Stack<T> in regards to how the traversal order changes:

public static IEnumerable<ITreeNodeDefinition> BreadthFirstSearch(ITreeNodeDefinition root, Func<ITreeNodeDefinition, bool> predicate)
{
    var queue = new Queue<ITreeNodeDefinition>();
    queue.Enqueue(root);
    while (queue.Count > 0)
    {
        var current = queue.Dequeue();
        if (predicate(current))
            yield return current;

        foreach (var node in current.Nodes)
            queue.Enqueue(node);
    }
}

For your use case, there is likely no advantage to either one except for breadth-first being ordered in a way that is generally better when listing the nodes.

Maxim Kitsenko · Answer 2 · 2018-09-18T15:01:43.793

Looking to the question/code i think that the problem relates not to one thing (like you said redundant tempTable variable) but it relates to several areas, I'll try highlight these areas/problems.

1) Theory. The first thing you need to know before iterating the tree - is that there are 2 ways of iterating trees 'breadth-first' and 'depth-first'. They can be implemented via 'recursion' and 'loop'. There are a lot of articles about it.

I suggest you to read few articles about these approaches, some of them:

2) Problem that you noticed not significant from the performance pov. Yes, you are right at when you say storing the result of the previous call into 'tempTable' is not very good. Returning the 'tempTable' has no big impact on performance or memory since 'tempTamble' is referenced to the same object as 'tndList'. Returning method parameter does not bring 'inefficiency'. The only one thing it impacts on is - not clean code and few bytes in the stack. Really you don't need to return anything in your method. Why do you return the list?

I suggest you to read about value and reference types. Some materials

I changed your you code a little bit, now it returns void:

public void SearchNodesByRelationshipId(ITreeNodeDefinition treeNodeDefinition, int? relationshipId, List<ITreeNodeDefinition> tndList)
{
    if (treeNodeDefinition.RelationshipId == relationshipId)
    {
        tndList.Add(treeNodeDefinition);
    }

    if (treeNodeDefinition.Nodes.Count != 0)
    {
        foreach (ITreeNodeDefinition nodeDefinition in treeNodeDefinition.Nodes)
        {
            this.SearchNodesByRelationshipId(nodeDefinition, relationshipId, tndList);
        }
    }
}

3) Another problem. Significant. The way you are doing iterating - is 'depth-first' recursion. This approach is not robust, potentially it leads to 'StackOverflowException' . Because of long chain of recursive method calls.

I suggest you to read about iteration and recursion algorithms in the context of trees, and implement iteration approach.

Just for information: There is another way to avoid 'stackOverfrwException' with 'recursion' approach - tail recursion, but afaik, there is no such mechanism in C#, but such a mechanism exists in F# and other functional languages.

Briefly how iterative approach works, in pseudocode:

put the root to the collection X (which is queue for '*breadth-first*' and stack for '*depth-first*')
Do while X is not empty
    var currentNode = get next node from X
    process current root (do checks that you need, aggregate data etc.)
    get child nodes of the currentNode, save them into X

For depth-first without recursion, you can use a `Stack`. For breadth-first without recursion, you can use a `Queue`. — Parrish Husband, Sep 17 '18 at 23:20
@ParrishHusband, yes, it's correct, thank you! i'll include it into the answer. — Maxim Kitsenko, Sep 18 '18 at 08:46
Feels a bit condescending to read "probably, you have no big experience in development", I don't think that adds anything to the answer. — ldam, Sep 18 '18 at 08:54
@Logan, ok thank you. i have removed this text. I just mean that the question doesn't relate to one thing (like the author said -tempTable), but it related to a lof of areas. — Maxim Kitsenko, Sep 18 '18 at 12:29
Yeah I get where you're coming from with that and 100% fine with everything else you said, since that's directly addressing the question, but you added a condescending tone to it which made it less welcoming. My only issue was that part that changed the tone :) — ldam, Sep 18 '18 at 12:34
@Logan many thanks! Sorry, I didn't notice the tone is not very welcoming, I'll try to avoid such phrases in the future. Thanks a lot again! — Maxim Kitsenko, Sep 18 '18 at 12:37
I did originally see the most un-welcoming remark made initially, im glad you have decided to retract. I am not massively experienced but am 'trying' to make myself so. It really isn't helpful or inspiring to read such remarks. For the record, I have been able to easily and efficiently return all nodes in a list using some existing members of the class. — CJH, Sep 18 '18 at 22:57

Searching All Nodes in a Tree Structure efficiently

2 Answers2