Build Tree more efficiently?

Question

I was wondering if this code is good enough or if there are glaring newbie no-no's.

Basically I'm populating a TreeView listing all Departments in my database. Here is the Entity Framework model:

alt text

Here is the code in question:

private void button1_Click(object sender, EventArgs e)
{            
    DepartmentRepository repo = new DepartmentRepository();
    var parentDepartments = repo.FindAllDepartments()
                          .Where(d => d.IDParentDepartment == null)
                          .ToList();
    foreach (var parent in parentDepartments)
    {           
        TreeNode node = new TreeNode(parent.Name); 
        treeView1.Nodes.Add(node);

        var children = repo.FindAllDepartments()
                     .Where(x => x.IDParentDepartment == parent.ID)
                     .ToList();
        foreach (var child in children)
        {
            node.Nodes.Add(child.Name);                       
        }                
    }
}

EDIT:

Good suggestions so far. Working with the entire collection makes sense I guess. But what happens if the collection is huge as in 200,000 entries? Wouldn't this break my software?

DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();

var parentDepartments = entries
                      .Where(d => d.IDParentDepartment == null)
                      .ToList();
foreach (var parent in parentDepartments)
{
    TreeNode node = new TreeNode(parent.Name);
    treeView1.Nodes.Add(node);

    var children = entries.Where(x => x.IDParentDepartment == parent.ID)
                          .ToList();
    foreach (var child in children)
    {
        node.Nodes.Add(child.Name);
    }
}

In response to your edit of "But what happens if the collection is huge as in 200,000 entries". You are already loading the entire collection and read the items on YAGNI and Premature Optimization in my answer. — Bronumski, Nov 01 '10 at 13:47

score 3 · Answer 1 · answered Nov 01 '10 at 13:25

3

Since you are getting all of the departments anyway, why don't you do it in one query where you get all of the departments and then execute queries against the in-memory collection instead of the database. That would be much more efficient.

In a more general sense, any database model that is recursive can lead to issues, especially if this could end up being a fairly deep structure. One possible thing to consider would be for each department to store all of its ancestors so that you would be able to get them all at once instead of having to query for them all at once.

answered Nov 01 '10 at 13:25

Keith Rousseau

4,435
1
22
28

This seems to be a very good idea, but if the database is really big it would be an overkill to download everything. Besides, who would need to browse the entire structure at once? Please see my answer - asynchronous loading seems to be the key here. – Piotr Justyna Nov 01 '10 at 13:53
1

But you don't know how big the database is yet. We don't have that much information. You should only optimize when you start to feel pain because you don't know what the best optimization is going to be until you do start having issues. – Bronumski Nov 01 '10 at 13:57
Well, maybe it's just me - after dozens tree views implemented that's just a habit for me, not an extra optimisation. Anyway, I really avise using asynchronous tree view nodes loading. – Piotr Justyna Nov 01 '10 at 14:01
I see your point Piotr, but I agree with Bronumski - it's all about the amount of data. If we're talking about 4 departments and at most 100 items then it could all be loaded at once easily. Your approach is certainly the right one for a large set of data, but overkill and slower for a small set. – Keith Rousseau Nov 01 '10 at 14:04
@Piotr This obviously the way the OP wanted to go as he has amended his original question to talk about 200,000 + entries and the fact they marked your answer as correct but without that information I still stand by calling YAGNI and premature optimization. – Bronumski Nov 01 '10 at 14:12
@Keith, I appreciate your opinion, but could you please clarify why exactly is getting just a piece of the structure whenever it's needed slower than getting the whole structure? – Piotr Justyna Nov 01 '10 at 14:27
@Bronumski, YAGNI is a brilliant principle and every coder should obey it. I guess this is the reason why I've advised asynchronous loading. Could you please clarify why do you advise to obey YAGNi and at the same time download the whole structure at once? – Piotr Justyna Nov 01 '10 at 14:31
If you read my answer I have said that the OP's code has an n+1 issue they were already loading the whole structure. I have then gone on to say however that if his code works and there are no performance issues then leave it. YAGNI says that if I don't know that I am going to have 200,000 items then I ain't going to need asynchronous loading. However If the spec says that I will have 200,000 plus items then I will have to think about something like asynchronous loading. – Bronumski Nov 01 '10 at 15:01
Well, the fact is that I've read it and I still miss your point. Let me put it like this: if you don't know for sure (regardless of the database size), if users will need all records at once, why do you presume, that it's true and load all records? Why are you guys so afraid of asynchronous loading? It's neither complicated nore slow. – Piotr Justyna Nov 01 '10 at 15:16
1

Because it requires more code, more code inherently makes it more complicated, more complicated code requires more tests and it is premature optimization. All we know from the OP is that some data needs to be loaded when a button is clicked. There is no scope as to scenario, how many users, is it a web application or a desktop application. Without either more information or feedback from users it is not clear as to how best to optimize. It may be asynchronous loading but if there are only 10 departments why bother? – Bronumski Nov 01 '10 at 15:31
Bronumski, I really hate to disappoint you, but I still don't see your point. In fact your code seems to be overly complicated to me - you use 2 foreach loops while you could use one, simple 'Where' method (executed after one of the nodes is expanded - select nodes, where node 'x' is a parent node) and just one loop to populate the parent node. I really can understand that you're reluctant to use my approach, but if you ever feel tempted, please do try to convince yourself that it's not as difficult as you think. – Piotr Justyna Nov 01 '10 at 15:42
Excuse me! I have not even posted any of my own code. Again you have not read my post. The only code I have posted is the OP's edited code and showed them how they have still got an n+1 despite their changes. If you read my post you would see that I am only advocating changing there code if there is any need. I am not reluctant to use your approach if there is a requirement, which in this case there is as the OP has stated that he needed to support 200,000+ records. Don't even start to imply that I don't know how to load something asynchronously. Let's just agree to disagree. – Bronumski Nov 01 '10 at 16:13
I have a feeling that you're taking it too seriously. SO is a place to exchange ideas and correcting each other not addressing anyone personally. I see no point in continuing this discussion as this is getting more and more awkward. – Piotr Justyna Nov 01 '10 at 16:19
1

Piotr - I said it's faster to get the whole structure at once because you don't need to make subsequent calls back to the server. I agree that async isn't complicated or slow, but it's a bit more complicated and slower than just fetching it all at once and rendering it for SMALL SETS OF DATA. – Keith Rousseau Nov 01 '10 at 16:44

score 2 · Answer 2 · edited May 23 '17 at 12:30

Assuming that you are getting the data from a database the first thing that comes to mind is that you are going to be hitting the database n+1 times for as many parents that you have in the database. You should try and get the whole tree structure out in one hit.

Secondly, you seem to get the idea patterns seeing as you appear to be using the repository pattern so you might want to look at IoC. It allows you to inject your dependency on a particular object such as your repository into your class where it is going to be used allowing for easier unit testing.

Thirdly, regardless of where you get your data from, move the structuring of the data into a tree data structure into a service which returns you an object containing all your departments that have already been organised (This basically becomes a DTO). This will help you reduce code duplication.

With anything you need to apply the yagni principle. This basically says that you should only do something if you are going to need it so if the code you have provided above is complete, needs no further work and is functional don't touch it. The same goes with the performance issue of select n+1, if you are not seeing any performance hits don't do anything as it may be premature optimization.

In your edit

DepartmentRepository repo = new DepartmentRepository();
var entries = repo.FindAllDepartments();

var parentDepartments = entries.Where(d => d.IDParentDepartment == null).ToList();
foreach (var parent in parentDepartments)
{
    TreeNode node = new TreeNode(parent.Name);
    treeView1.Nodes.Add(node);

    var children = entries.Where(x => x.IDParentDepartment == parent.ID).ToList();
    foreach (var child in children)
    {
        node.Nodes.Add(child.Name);
    }
}

You still have a n+1 issue. This is because the data is only retrieved from the database when you call the ToList() or when you iterate over the enumeration. This would be better.

var entries = repo.FindAllDepartments().ToList();

var parentDepartments = entries.Where(d => d.IDParentDepartment == null);
foreach (var parent in parentDepartments)
{
    TreeNode node = new TreeNode(parent.Name);
    treeView1.Nodes.Add(node);

    var children = entries.Where(x => x.IDParentDepartment == parent.ID);
    foreach (var child in children)
    {
        node.Nodes.Add(child.Name);
    }
}

score 2 · Answer 3 · answered Nov 01 '10 at 13:42

In light of your edit, you might want to consider an alternative database schema that scales to handle very large tree structures.

There's a explanation on the fogbugz blog on how they handle hierarchies. They also link to this article by Joe Celko for more information.

Turns out there's a pretty cool solution for this problem explained by Joe Celko. Instead of attempting to maintain a bunch of parent/child relationships all over your database -- which would necessitate recursive SQL queries to find all the descendents of a node -- we mark each case with a "left" and "right" value calculated by traversing the tree depth-first and counting as we go. A node's "left" value is set whenever it is first seen during traversal, and the "right" value is set when walking back up the tree away from the node. A picture probably makes more sense:

The Nested Set SQL model lets us add case hierarchies without sacrificing performance.

How does this help? Now we just ask for all the cases with a "left" value between 2 and 9 to find all of the descendents of B in one fast, indexed query. Ancestors of G are found by asking for nodes with "left" less than 6 (G's own "left") and "right" greater than 6. Works in all databases. Greatly increases performance -- particularly when querying large hierarchies.

score 1 · Answer 4 · answered Nov 01 '10 at 13:29

Things that come to mind:

It looks like .ToList() is needless. If you are simply iterating over the returned result, why bother with the extra step?

Move this function into its own thing and out of the event handler.

As other have said, you could get the whole result in one call. Sort by IDParentDepartment so that the null ones are first. That way, you should be able to get the list of departments in one call and iterate over it only once adding children departments to already existing parent ones.

Oh yeah, I'll move this into where it belongs. Things are in the event handler just for test demo purposes. Don't worry, I never ship production code like this in the event handler! :D — , Nov 01 '10 at 13:32

Piotr Justyna · Accepted Answer · 2010-11-01T14:58:48.650

That looks ok to me, but think about a collection of hundreds of thousands nodes. The best way to do that is asynchronous loading - please notice, that do don't necassarily have to load all elements at the same time. Your tree view can be collapsed by default and you can load additional levels as the user expands tree's nodes. Let's consider such case: you have a root node containing 100 nodes and each of these nodes contains at least 1000 nodes. 100 * 1000 = 100000 nodes to load - pretty much, istn't it? To reduce the database traffic you can first load your first 100 nodes and then, when user expands one of those, you can load its 1000 nodes. That will save considerable amount of time.

score 0 · Answer 6 · edited May 23 '17 at 11:48

0

Wrap the TreeView modifications with:

treeView.BeginUpdate(); // modify the tree here. treeView.EndUpdate();

To get better performance.

Pointed out here by jgauffin

edited May 23 '17 at 11:48

Community

1
1

answered Nov 01 '10 at 13:30

fedotoves

757
6
7

Good tip! Will definitely share this online with everyone. – Nov 01 '10 at 13:35

Mohit Vashistha · Answer 7 · 2010-11-02T04:11:32.650

0

In addition to Bronumski and Keith Rousseau answer Also add the DepartmentID with the nodes(Tag) so that you don't have to re-query the database to get the departmentID

edited Nov 02 '10 at 04:11

answered Nov 01 '10 at 13:30

Mohit Vashistha

1,824
3
22
49

score 0 · Answer 8 · answered Nov 01 '10 at 13:54

This should use only one (albeit possibly large) call to the database:

Departments.Join(
        Departments,
        x => x.IDParentDepartment,
        x => x.Name,
        (o,i) => new { Child = o, Parent = i }
    ).GroupBy(x => x.Parent)
    .Map(x => {
            var node = new TreeNode(x.Key.Name);
            x.Map(y => node.Nodes.Add(y.Child.Name));
            treeView1.Nodes.Add(node);
        }
    )

Where 'Map' is just a 'ForEach' for IEnumerables:

public static void Map<T>(this IEnumerable<T> source, Action<T> func)
{
    foreach (T i in source)
        func(i);
}

Note: This will still not help if the Departments table is huge as 'Map' materializes the result of the sql statement much like 'ToList()' does. You might consider Piotr's answer.

Build Tree more efficiently?

8 Answers8

Linked