6

Is it possible to get a uniformly distributed random value (calling the function means it's equally likely to get any value in the tree) from a balanced binary search tree in O(log n) time?

My initial idea was to generate a random number 0, 1, or 2. If 0, take the left path from the current node, if 1, take the right path, otherwise the value of the node is the random value. If you hit a leaf node, take the value of that node. I don't think this would be randomly distributed though.

This is out of curiosity, not for a specific application.

Let me know if you need any clarifications.

An example is if you have the tree

     1
    / \
   2   5
       /
      3

The numbers 1, 2, 3, and 5 would be returned uniformly when calling int get_random_number()

Clarification: All other normal BST operations should remain O(log n), like insert(), delete(), etc.

gsgx
  • 12,020
  • 25
  • 98
  • 149
  • Which part? Maybe I should provide an example? – gsgx Nov 09 '12 at 00:57
  • Uniform? Just pick a random integer <= number of tree nodes, then traverse to that node and return it. – Aziz Nov 09 '12 at 00:58
  • @Aziz. If you pick the last node, that's O(n) time. – gsgx Nov 09 '12 at 00:59
  • But in fact, you only want to pick a (uniform) random *node* from the tree? (which happens to have a number on it) – wildplasser Nov 09 '12 at 01:00
  • @wildplasser Yea, essentially randomly get a node and return that node's value. O(n) time is easy, I'm wondering if it's possible in O(log n) time. – gsgx Nov 09 '12 at 01:01
  • The basic answer is that it is not possible to do uniform randomness over any set that does not contain some power of two elements if you can only generate random bits. Essentially, if you take the prime factorization of the number of elements, you need to be able to generate a random number up to each one of its factors to be able to select an element on a uniformly random way. – AJMansfield Nov 09 '12 at 01:04
  • Than: why don't you *say* so? My gut feeling is that *without instrumentation* it is O(N). With instrumentation, it could be O log(N). – wildplasser Nov 09 '12 at 01:04
  • Correction: when instrumented, it *could* be O(1). But your inserts/updates/deletes would be much more costly. – wildplasser Nov 09 '12 at 01:12
  • @wildplasser Right, I forgot to mention that all other normal operations (insert, delete, etc.) should remain O(log n). – gsgx Nov 09 '12 at 01:15
  • They will. Only the big O will get bigger. Imagine the tree as an array: once a node is deleted, the (logical) array size will get smaller. That's a swap operation (swap deleted node with arraytop). Sampling from the tree will still be sampling from the (variable sized) are. (but the size is known) – wildplasser Nov 09 '12 at 01:20
  • 1
    @wildplasser If you keep the tree as an array, finding the element to delete would be O(n) unless you do binary search, but then you'd have to keep it sorted, so it wouldn't remain O(log n). I also don't get what you mean by "only the big O will get bigger". That's what I'm concerned about... – gsgx Nov 09 '12 at 01:22
  • No. Pointers and indexes are equivalent (how do you think databases work?). Only the O will get bigger. – wildplasser Nov 09 '12 at 01:25

1 Answers1

12

Your idea will not create a random distribution. The root has a 1/3 chance of being picked no matter the size of the tree.

If you know the number of elements in the tree, generate a number k between 1 and N, and get the kth largest element of the tree. See here for a way to do that in O(logn) for balanced trees.

Community
  • 1
  • 1
Eric W.
  • 814
  • 1
  • 8
  • 18
  • 1
    This is clever. I'll read that link and possibly select this if there's no better answer. – gsgx Nov 09 '12 at 01:06