Is a list implementation of binary tree scalable?

Question

I am writing a simple codec. The tree will be precomputed and there wont be any changes once its build. It will just be searched.

All the leaf nodes of a balanced binary tree are the signal values and internal nodes are the approximated compressed representations.

Is a list implementation using stl vector scalable if I have a large value of leaf nodes? Currently I dont know how large large is.

List implementation e.g. 1,2,3,4,5,6,7 if I have 4 leaf nodes

then children of

root(1)-> 2,3
2->4,5
3->6,7

so I can simply jump to the children using their position in the vector.

Why you have to expose this detail (how the list is implemented) to class' clients? If you'll have any performance issue (now or in the future but a test is somehow mandatory) simply change implementation et-voila (with a mediator, for example). — Adriano Repetti, May 23 '12 at 07:37

score 4 · Accepted Answer · answered May 23 '12 at 07:43

4

Using a "list" or "array" should not pose any issue in this case (since the tree is immutable). The only requirement for O(log n) search is quick random access (ie, O(1) access to a given index).

You can use either a vector or a deque, both are suitable. You might run in a scalability issue with the vector if the system cannot find a large enough memory block to hold all elements, though to begin with it should prove simpler. If you hit this roadblock, switch to deque, and although you might lose some speed, it should allow you to grow further due to its fragmented nature.

answered May 23 '12 at 07:43

Matthieu M.

287,565
48
449
722

deque doesn't give you random access in O(1). – Himadri Choudhury May 23 '12 at 08:26
2

@HimadriChoudhury: Since I cannot exactly link to the Standard, I'll do the next best; SGI [deque](http://www.sgi.com/tech/stl/Deque.html) models a [Random Access Container](http://www.sgi.com/tech/stl/RandomAccessContainer.html) which reads in complexity guarantees "The run-time complexity of element access is amortized constant time." – Matthieu M. May 23 '12 at 08:38
@Matthiew. Thanks for the link. I didn't realize that was how random access on deque was speced. In my experience though, random access is considerably slower on deque than vector. For example, on my Linux box with gcc 3.4.6, randomly summing 10M ints from a 10M int vector ran in about .4sec, while on deque it was .9sec. – Himadri Choudhury May 23 '12 at 19:23
@HimadriChoudhury: This looks about right to me. A `vector` induces a single dereference while a `deque` induces 2 (get bucket + get element in bucket), so twice as much time is the right factor. – Matthieu M. May 24 '12 at 06:39

score 0 · Answer 2 · answered May 23 '12 at 07:53

I would prefer a preallocated array of your tree nodes (mostly malloced at the initialization time ) with MAX_ELEMENTS as opposed to a stl vector for simple reason of having all tree nodes continuously located on the heap for quicker access at run time.

I said quicker access mainly because the pointer jumps may not be uniform with a distributed stl vector elements for a really huge number of elements in the list (say overflowing a 32 bit number )

You can look at it from the cache line misses from the L1 caches access Point of view of you processor.

Is a list implementation of binary tree scalable?

2 Answers2

Linked