4

If you have an AVL tree, what's the best way to get the median from it? The median would be defined as the element with index ceil(n/2) (index starts with 1) in the sorted list.

So if the list was

1 3 5 7 8

the median is 5. If the list was

1 3 5 7 8 10

the median is 5.

If you can augment the tree, I think it's best to let each node know the size (number of nodes) of the subtree, (i.e. 1 + left.size + right.size). Using this, the best way I can think of makes median searching O(lg n) time because you can traverse by comparing indexes.

Is there a better way?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
omega
  • 40,311
  • 81
  • 251
  • 474
  • FWIW, the technique of putting counts in the nodes makes your AVL trees also into Order Statistic Trees: http://en.wikipedia.org/wiki/Order_statistic_tree . Of course you must also modify the rotations in the balancing algorithm to adjust the node counts. @templatetypedef's idea of threading the tree and using a single median pointer is optimal. It doesn't need node counts and is constant time per op. He did not mention that you also need to know the total node count in the tree so that you can determine if the median is a single element (odd node count) or averge of two (even node count). – Gene May 22 '14 at 18:36
  • @Gene Thanks for filling in those details! – templatetypedef May 22 '14 at 19:03

1 Answers1

6

Augmenting the AVL tree to store subtree sizes is generally the best approach here if you need to optimize over median queries. It takes time O(log n), which is pretty fast.

If you'll be computing the median a huge number of times, you could potentially use an augmented tree and also cache the median value so that you can read it in time O(1). Each time you do an insertion or deletion, you might need to recompute the median in time O(log n), which will slow things down a bit but not impact the asymptotic costs.

Another option would be to thread a doubly-linked list through the nodes in the tree so that you can navigate from a node to its successor or predecessor in constant time. If you do that, then you can store a pointer to the median element, and then on an insertion or a deletion, move the pointer to the left or to the right as appropriate. If you delete the median itself, you can just move the pointer left or right as you'd like. This doesn't require any augmentation and might be a bit faster, but it adds two extra pointers into each node.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065