How to validate if a parent item has children in recursive LINQ function?

Question

I'm doing a recursive LINQ function as described in tis question: Simulating CTE recursion in C#

My code is as follows:

private static IEnumerable<KeyValuePair<int, int>> getNet(List<DataRow> list, int? leader, int level)
{
    return list
        .Where(x => x.Field<int?>("LeaderID") == leader)
        .SelectMany(x =>
            new[] { 
               new KeyValuePair<int, int>(x.Field<int>("RepID"), level)
                  }.Concat(getNet(list, x.Field<int>("RepID"), level+ 1))
         );
}

I'd like to validate if a parent has children before entering the function again because every child gets evaluated again and that consumes a lot of time.

i.e Parent A has 5000 children, but only 5 of them have children, I need something to validate if A's children have children before executing the function for all of them.

Thanks!

Have you verified that you have a performance problem? Calling a method that does nothing on an item is going to be very cheap for a computer, even if it has to do it 5,000 times. Now if it has to do it 50 billion times you may be able to perceive the time that it takes. — Servy, Dec 16 '13 at 17:56
I did, I have about 3,000 parents with 5,000- children; but each parent gets evaluated about 6 times. That's why I hope the overall time gets reduced with this — Daniel Martinez, Dec 16 '13 at 18:00
If you want to speed things up you might want to convert the ListDataRow to a `Lookup` with `LeaderID` as the key. — Magnus, Dec 16 '13 at 18:08
@DanielMartinez: Looking at the referenced link, it appears you left out the angle brackets. I edited and fixed, but if your code was as you intended, I can revert it back. — , Dec 16 '13 at 18:10

Servy · Accepted Answer · 2013-12-16T18:32:06.530

So testing the children earlier really isn't going to help you. You're still doing the same work, conceptually. If each recursive call handles two depths at once, instead of one, you're greatly complicating the method itself, duplicating the work of the second depth whenever there are children, and gaining very, very little even when there are no children. The expensive part of the method is in the linear search through the list for the children, not in the method call that starts the search.

To improve performance you need to stop doing linear searches through a very large list many, many times. Fortunately, this is easy enough to do. Just create a lookup once, at the start of the method, of all children for each parent.

var lookup = list.ToLookup(x => x.Field<int?>("LeaderID"));

Now, what you're trying to do, conceptually, is traverse a tree. We can start out with a generalized "traverse" method capable of traversing any tree (using a non-recursive implementation, for performance improvements):

public static IEnumerable<T> Traverse<T>(IEnumerable<T> source, 
    Func<T, IEnumerable<T>> childSelector)
{
    var queue = new Queue<T>(source);
    while (queue.Any())
    {
        var item = queue.Dequeue();
        yield return item;
        foreach (var child in childSelector(item))
        {
            queue.Enqueue(child);
        }
    }
}

Now that we have these we can use the lookup to efficiently get all children for each node, greatly improving the implementation. Of course, it'd be prettier if you didn't also need the depth of each node, but it's still not that bad.

var roots = lookup[leader]
    .Select(row => Tuple.Create(row.Field<int>("RepID"), 0));

return Traverse(roots, node => lookup[node.Item1]
    .Select(row => Tuple.Create(row.Field<int>("RepID"), node.Item2 + 1)));

If you don't need to know the depth of each node you can simplify all of that down to this:

var lookup = list.ToLookup(
    row => row.Field<int?>("LeaderID"),
    row => row.Field<int>("RepID"));
return Traverse(lookup[leader], node => lookup[node]);

Thank you Servy, I actually don't need the depth but implemented it that way cause I based on the other method. How'd the code be withoud depth? Thank you very much!! — Daniel Martinez, Dec 16 '13 at 18:22
@Servy Is there a big performance benefit in not using a recursive call? — Magnus, Dec 16 '13 at 18:49
@Magnus That depends on the context, and the data used on it. In my example I had an iterator block. Recursive method calls in iterator blocks or `async` methods are much more expensive simply because the method call becomes a state machine, involves the allocation of a new heap object, etc., unlike normal method calls. You also need to consider whether the tree is deep enough to blow out the stack, which is probably one of the chief concerns with a recursive method; usually you'll blow out the stack before you run into actual performance problems. — Servy, Dec 16 '13 at 18:58

How to validate if a parent item has children in recursive LINQ function?

1 Answers1

Linked

Related