2

I made what I'm calling a TreePruner. Its purpose: given a hierarchy starting at a list of root level nodes, return a new hierarchy where the new root nodes are the highest level nodes that meet a certain condition. Here is my class.

public class BreadthFirstPruner<TResource>
{
    private IEnumerable<TResource> originalList;
    private IEnumerable<TResource> prunedList;
    private Func<TResource, ICollection<TResource>> getChildren;

    public BreadthFirstPruner(IEnumerable<TResource> list, Func<TResource, ICollection<TResource>> getChildren)
    {
        this.originalList = list;
        this.getChildren = getChildren;
    }

    public IEnumerable<TResource> GetPrunedTree(Func<TResource,bool> condition)
    {
        this.prunedList = new List<TResource>();
        this.Prune(this.originalList, condition);
        return this.prunedList;
    }

    private void Prune(IEnumerable<TResource> list, Func<TResource,bool> condition)
    {
        if (list.Count() == 0)
        {
            return;
        }

        var included = list.Where(condition);
        this.prunedList = this.prunedList.Union(included);
        var excluded = list.Except(included);
        this.Prune(excluded.SelectMany(this.getChildren), condition);
    }
}

The class does what it's supposed to, but it does so slowly, and I can't figure out why. I've used this on very small hierarchies where the complete hierarchy is already in memory (so there should be no linq-to-sql surprises). But regardless of how eager or lazy I try to make things, the first line of code to actually evaluate the results of a linq expression winds up taking 3-4 seconds to execute.

Here is the code that's currently consuming the pruner:

Func<BusinessUnitLabel, ICollection<BusinessUnitLabel>> getChildren = l => l.Children;
var hierarchy = scope.ToList();
var pruner = new BreadthFirstPruner<BusinessUnitLabel>(hierarchy, getChildren);
Func<BusinessUnitLabel, bool> hasBusinessUnitsForUser = l =>
    l.BusinessUnits.SelectMany(bu => bu.Users.Select(u => u.IDGUID)).Contains(userId);
var labels = pruner.GetPrunedTree(hasBusinessUnitsForUser).ToList();

As I stated previously, the dataset that I'm working with when this executes is quite small. It's only a few levels deep with only one node on most levels. As it's currently written, the slowness will occur on the first recursive call to Prune when I call list.Count(), because that's when the second level of the hierarchy (excluded.SelectMany(this.getChildren)) is being evaluated.

If, however, I add a .ToList call like so:

var included = list.Where(condition).ToList()

Then the slowness will occur at that point.

What do I need to do to make this thing go fast?

Update

After someone prompted me to reevaluate my condition more carefully, I realized that those associations in hasBusinessUnitsForUser were not being eager loaded. That there was the problem.

Samo
  • 8,202
  • 13
  • 58
  • 95
  • You should choose `Recursion` over `SelectMany` might require to rewrite the strategy but this should definetly give you some improvements. – vendettamit Sep 29 '15 at 20:34
  • @dustmouse That would just recurse forever because `list.Where(condition)` would always be empty in the recursive calls, so `excluded` would wind up being the same list that was passed in. – Samo Sep 29 '15 at 20:43
  • @vendettamit I'm not sure if I understand your comment, but nonetheless, I can cause the slowness to occur before the `SelectMany` is ever called by simply calling `list.Where(condition).ToList()`. It confuses me that this would be a slow operation given that all the objects are already in memory. – Samo Sep 29 '15 at 20:49
  • 1
    Have you tested it with a more simple condition? – lintmouse Sep 29 '15 at 20:51
  • See if treeview on following webpage is better : http://stackoverflow.com/questions/28976601/recursion-parsing-xml-file-with-attributes-into-treeview-c-sharp – jdweng Sep 29 '15 at 20:55
  • @dustmouse Your comment prompted me to re-examine my condition, at which point I realized I was being really dumb and not eager loading those associations! Thanks for pointing me in the right direction! – Samo Sep 29 '15 at 21:13

1 Answers1

1

These calls are all lazily executed and the results are not cached/materialized:

    var included = list.Where(condition);
    this.prunedList = this.prunedList.Union(included);
    var excluded = list.Except(included);

Even in this snippet included runs twice. Since this is a recursive algorithm there might be many more invocations.

Add a ToList call to any sequence that might be executed more than once.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Thanks. Ultimately my problem was failing to eager load some EF associations, but I can see how your suggestion could give a performance boost. – Samo Sep 29 '15 at 21:59