1

I'm attempting to build a tree, where each node can have an unspecified amount of children nodes. The tree is to have over a million nodes in practice.

I've managed to contruct the tree, however I'm experiencing memory errors due to a full heap when I fill the tree with a few thousand nodes. The reason for this is because I'm attempting to store each node's children in a Dictionary data structure (or any data structure for that matter). Thus, at run-time I've got thousands of such data structures being created since each node can have an unspecified amount of children, and each node's children are to be stored in this data structure.

Is there another way of doing this? I cannot simply use a variable to store a reference of the children, as there can be an unspecified amount of children for each node. THus, it is not like a binary tree where I could have 2 variables keeping track of the left child and right child respectively.

Please no suggestions for another method of doing this. I've got my reasons for needing to create this tree, and unfortunately I cannot do otherwise.

Thanks!

user1261466
  • 331
  • 1
  • 3
  • 5
  • 2
    How about using a database instead? You could use SQLite and its in-memory support, then you would only store a parent-link for each node? You would then have to query the table saying something like `SELECT * FROM nodes WHERE parent_id = 10` if you want to find all children of node 10. Note, I'm not sure this would use less memory, but if you hit the memory roof after just a couple of thousand nodes, I'm guessing you're using an excessive amount of memory for each node. – Lasse V. Karlsen Mar 10 '12 at 20:08
  • Have you looked into optimizing the data the nodes will contain? This can help reduce memory usage – GETah Mar 10 '12 at 20:22
  • The problem is that I've never implemented a tree as a database, and would have no idea where to start from. There are a lot of complex calculations which have to be carried out in terms of insertion, deletion, etc and I can't imagine implementing it with a database. – user1261466 Mar 10 '12 at 20:22
  • I know you said not to suggest any other approach, but I can't resist! Could you just have one dictionary and populate it with all the objects and just store the keys of the children in the nodes (in a list say)? At least just populating that dictionary would show if you need to move to 64bit for memory reasons. – Mike Hanrahan Mar 10 '12 at 20:24
  • @GETah Each node contains a Dictionary, which is why I believe I'm getting this error as I've got thousands of dictionaries running around at run-time. – user1261466 Mar 10 '12 at 20:25
  • @MikeHanrahan How would I be able to cater for the fact that each node can have a number of children though? – user1261466 Mar 10 '12 at 20:27
  • Each node object would have a list of nodes which it can just add to as it needs to add dynamic numbers of children. The node object could also contain your key to look up the data object in your "global" dictionary. – Mike Hanrahan Mar 10 '12 at 20:31
  • @MikeHanrahan Thanks for your suggestion mike, but wont I just have thousands of lists running around then instead? – user1261466 Mar 10 '12 at 20:32
  • 2
    "Please no suggestions for another method of doing this." - that seems very narrow-minded. You should focus on solving the problem in the most efficient way possible, not solving it in the one particular way you've latched on to. – Matt Burland Mar 10 '12 at 20:33
  • Yes...but I don't think that should be a problem, as your nodes will be very light weight. Before you go much further with any implementation you should probably get your hands on a memory profiler so you better understand what exactly is causing you to run out of memory so quickly (e.g. RedGate ANTS profiler). – Mike Hanrahan Mar 10 '12 at 20:34

1 Answers1

2

How many of your nodes will be "leaf" nodes? Perhaps only create the data structure to store children when you first have a child, otherwise keeping a null reference.

Unless you need to look up the children as a map, I'd use a List<T> (initialized with an appropriate capacity) instead of a Dictionary<,> for the children. It sounds like you may have more requirements than you've explained though, which makes it hard to say.

I'm surprised you're failing after only a few thousand nodes though - you should be able to create a pretty large number of objects before having problems.

I'd also suggest that if you think you'll end up using a lot of memory, make sure you're on a 64-bit machine and make sure your application itself is set to be 64-bit. (That may just be a thin wrapper over a class library, which is fine so long as the class library is set to be 64-bit or AnyCPU.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I'm using Dictionary as I need the Key,Value pairs feature. Is it far fetched that the application crashes considering that it's creating thousands of Dictionary objects (one per node)? I would have thought that this is the reason for the out of memory exception. – user1261466 Mar 10 '12 at 20:13
  • @user1261466: You'd really need to show some code and give more figures, to be honest. Just a few thousand? Unlikely. Is the key based on the value of the child? If so, and if you don't have *very* many child nodes, you may want to just scan through the list instead of keeping a `Dictionary<,>`. Again, knowing more about what you're trying to achieve would be useful. – Jon Skeet Mar 10 '12 at 20:16
  • @user1261466: It sounds like you might want to have different object types for "leaf node", "node with one child", "node with a few children" and "node with many children". I'm still suspicious that "thousands" of dictionaries would cause a problem though. *Millions*, yes... – Jon Skeet Mar 10 '12 at 20:29