Why do we represent trees as cells linked by pointers and not store everything in a flat array?

Question

Why do we use pointers, structures and all those things to make a tree and why not just use an array? Is that due to the limitations of array, or do using pointers etc. giving better performance than arrays?

Unless the tree is almost perfectly balanced, an array representation will waste a ton of memory. — user2357112, Aug 05 '17 at 16:51
Sometimes a tree-in-an-array is useful; look at @trincot's answer to this question: https://stackoverflow.com/questions/43504632/take-every-k-th-element-from-the-1-n-natural-numbers-series/ — m69's been on strike for years, Aug 05 '17 at 18:43

user2357112 · Answer 1 · 2017-08-06T18:27:08.030

We do use an array, sometimes. For example, it's easily the most common way of representing the tree underlying a binary heap. However, it's not appropriate for most trees.

If you compute the child indexes from the parent index with a relationship like childindexes(i) = {2*i+1, 2*i+2}, as in the usual heap representation, then the array's size has to be proportional to the depth of the tree instead of the number of elements. This is fine when the tree is extremely well-balanced, as in the usual binary heap representation, but it wastes a ton of space for even something like a red-black tree, which is self-balancing but not that balanced. Also, stuff like tree rotations and other re-linking of the tree structure requires moving entire subtrees instead of changing a few pointers.

On the other hand, if each node has to store the array indexes where its children lie, then you're basically just reimplementing pointers and memory management, and you'd need a very good, specific reason to do that instead of just using what your language provides.

You said that each node has to store the array indexes where its children lie. What is the mean of that? — Adarsh Jain, Aug 06 '17 at 08:33
@AdarshJain: I said *if* each node has to store the array indexes where its children lie. This is one of the possible ways of managing child locations, and the rest of the paragraph describes its consequences. — user2357112, Aug 06 '17 at 18:24

score 1 · Accepted Answer · answered Aug 05 '17 at 17:29

The answer is... it depends!

In your question, I assume that you're talking about using an array to represent some sort of binary tree structure, where instead of having a set of pointers and links you use a flat array and store child indices.

The main reason not to do this is because it complicates inserting and removing nodes from the tree. For example, suppose you're representing a binary search tree as an array and decide to delete an element from the tree. If you're using raw pointers, you can do this by just freeing the memory for the node (and doing some fixups on the links in the tree). If you store everything in an array, you have a bunch of unpleasant options. One would be to simply keep the node in the array but just stop linking to it. This means that your tree structure will, over time, end up having a bunch of cruft build up in it and the overall structure will take up way more memory than usual. (You can think of that approach as essentially being a memory leak).

Another option would be to shift everything down in the array by one position to clear up the gap, or to otherwise find some node to relocate to that position. Both approaches take extra time per operation, which can slow things down.

A final option would be to mark the space unused and then, whenever you need more space for a new node, somehow find a free cell to use. You can make this work, but that's essentially what the memory manager already does! Internally, it treats everything as a giant block of memory and then somehow tracks which locations are free. (In fact, I think a great perspective to take on this whole idea is to compare yourself against the standard memory manager. If you know more than it does, you might actually want to do this on your own and think of the problem less as "fit the tree into an array" and more of "build a custom, small memory manager."

That all being said, there are lots of reasons why you might want to do this. For example, many implementations of linked structures in the C++ standard library will use custom allocators to try to keep cells next to one another, falling back on the general freestore allocator if space can't be found. This often leads to big performance gains due to locality of reference. In fact, creative custom allocators are a major part of program optimization in C++ applications.

You might also want to do something like this if you have a static tree that never changes. There's actually a bunch of research into how to do this as efficiently as possible. The van Emde Boas layout, for example, is one way to arrange tree nodes in a static array that makes for very efficient memory accesses and great locality.

In some cases, you may have a sufficiently strict tree shape that you can get away with this. The most famous example of this is the binary heap, which is logically a binary tree but which is typically stored implicitly in an array structure.

And in some other cases, you may actually want to do this on a dynamic tree structure. The RAVL tree, for example, is a variation on an AVL tree that uses lazy deletions to remove elements from the tree. In that case, throwing everything into a giant array might actually be a better option than using disorganized cell structures.

So ultimately the best answer I can give is "you sometimes do see this, depending on the application and the use case, though in general - and especially if you're taking a first course in data structures - you don't see this done."

Why we need to keep the node in the array or linking? We can just keep the values of nodes in array and can use formulas to access children. This will also take the same time in insertion and deletion of a node from tree. We can replace deleted node with some symbol in corresponding index. — Adarsh Jain, Aug 06 '17 at 08:40
Imagine that you have a binary search tree and that you insert the elements into it in ascending order. What shape will the tree have? Based on that, how big does the array need to be as a function of n, the number of nodes in the tree? — templatetypedef, Aug 06 '17 at 14:55
Any type of AVL tree has rotation, a situation where the height of two sub-tree are swapped. I don't see how that's anything but `O(n)` with any type of implicit representation? — Neil, Feb 02 '20 at 20:28

Why do we represent trees as cells linked by pointers and not store everything in a flat array?

2 Answers2