10

Give you:

  1. A stream (end of the stream is EOF)
  2. A function next() to get the next element in the stream and advance the pointer in the stream
  3. A random generator generating floats between 0 and 1 (inclusively) uniformly

Output:

  • An element that is proven to be randomly (uniformly distributed) chosen

You can one or two variables.

You are not allowed to use array / list, and you cannot tell the way that trying to get all elements out and store them all and then pick.


This is an interview question.

My thinking is:

  1. I use a var cur to store most recent kept element
  2. So, if i get a new element, I generate a random 0 or 1 using generator, if it is 0 then cur = new element; otherwise, continue;
  3. If I get EOF, then return cur

Is my thinking correct? How to prove?


Here is a same question

How would you pick a uniform random element in linked list with unknown length?

Community
  • 1
  • 1
Jackson Tale
  • 25,428
  • 34
  • 149
  • 271
  • Draw a probability tree. – Oliver Charlesworth Apr 28 '14 at 22:02
  • @OliCharlesworth how? you mean draw a probability tree to prove? – Jackson Tale Apr 28 '14 at 22:04
  • For the first few stages, yes, to identify the mathematical relationship. From that you should be able to identify a proof. – Oliver Charlesworth Apr 28 '14 at 22:04
  • 2
    Your algorithm basically returns the element that corresponds to the last time the generator returned 0, which corresponds to the geometric distribution of the number of consecutive 1's at the end (when seen from EOF in direction beginning of the stream). That is not an uniform distribution. – Hristo Iliev Apr 28 '14 at 22:09
  • You say the generator yields floats that are either 0 or 1 uniformly. Are you sure that wasn't supposed to be floats that are **between** 0 and 1 uniformly? – pjs Apr 28 '14 at 22:37
  • Is the uniformity of the output supposed to be over the set of possible values for the outcomes, or is it that the first value, last value, or any value in between should all have equal likelihood of being the one picked? I don't see how to do the former without keeping track of all the values that have been observed. In the latter case, you don't care what the values are, the uniformity is over the set of indices if you **were** allowed to use an array. – pjs Apr 28 '14 at 22:44
  • @pjs Have you read amit's answer? – Niklas B. Apr 28 '14 at 22:45
  • @NiklasB. yes, but I think the question itself is very ambiguous. I agree with Hristo Iliev's observation that if the RNG returns only zero or one, specifying that the result is a float seems inconsistent. – pjs Apr 28 '14 at 22:51

4 Answers4

17

Let the current element's index be i.

Choose to 'remember' the current element at probability 1/i. When EOF is reached, produced the element you remember.

At the end, for each element with index i there is a probability to be chosen:

enter image description here

A formal prove can be done using induction, following these guidelines.

Jackson Tale
  • 25,428
  • 34
  • 149
  • 271
amit
  • 175,853
  • 27
  • 231
  • 333
  • I think this has the same problem as my answer: How to decide whether to take the element (with probability 1/i) if all you have is a random number generator that produces only 0 or 1? It's at least not O(1) anymore. Very cool trick though – Niklas B. Apr 28 '14 at 22:16
  • @NiklasB. You only need O(log(i+1)) = O(log(i)) draws (average) to produce the probability 1/i, and you don't need more than one variable for it (iterator, basically). – amit Apr 28 '14 at 22:19
  • Makes sense, this looks very good then. This also doesn't make any assumptions about floating point precision :) – Niklas B. Apr 28 '14 at 22:19
  • could you please write `1/i * 1-1/(i+1)*1-1/(i+2) + ... + 1/n` more clearly? is it minus between them all like `(1/i)*1 - (1/(i+1))*1 - (1/(i+2)) - ... - (1/n)`? or just first two are minus? – Jackson Tale Apr 29 '14 at 08:52
  • is it `(1-(1/(i+1))*(1-1/(i+2))` – Jackson Tale Apr 29 '14 at 09:08
  • 1
    @JacksonTale Look at editted answer, was from android when writing the comment. – amit Apr 29 '14 at 10:17
  • I am trying to understand it without using induction. For index i, do we need to consider the cases before i? – Jackson Tale Apr 30 '14 at 10:46
  • @JacksonTale The probability to select the i'th element is by default 1/i, this is how we defined it - regardless of the previous elements. Now, for each j>i, the probability that we won't rewrite this value is `1-1/j`. So, the probability for choosing i, and then not rewriting it is `1/i*(1-(1/(i+1)))*...*(1-(1/n))` – amit Apr 30 '14 at 12:25
  • Is it possible to adapt this for selecting fixed numbers of records with equal probability from the stream? i.e. select 2 or 3 or ... instead of 1? – Asad Saeeduddin Sep 09 '15 at 06:47
  • 2
    @Asad Yes, this is called [Reservoir Sampling](https://en.wikipedia.org/wiki/Reservoir_sampling). The idea is to hold an array of size `k` (where `k` is the number of elements you chose), with probability `k/i` (where `i` is the index you are iterating now) you replace one of the elements in this array, and if you draw to replace an element, you do it with probability `1/k` for each element. – amit Sep 09 '15 at 08:11
5

This algorithm chooses the last element in the stream with a probability of 1/2, so unless the stream has size = 2, this is not a valid solution.

A valid way would be to assign a random float value drawn from a uniform distribution between [0..1] to every element and return the one with the largest (or smallest) value at the end. This can be done in O(1) auxiliary space, you just need to remember the largest value and the associated element.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • I was thinking of the same solution, but this sentence reads as if the generator only returns 0 or 1: _"A random generator generating floats either 0 or 1 uniformly"_. It doesn't say "floats between 0 and 1". But then, why floats? – Hristo Iliev Apr 28 '14 at 22:14
  • I found this interview question online, maybe the description was wrong. – Jackson Tale Apr 28 '14 at 22:30
  • 1
    @JacksonTale Imagine the sequence of random values drawn after you have processed all elements. They are all distinct. Assign them integers according to the rank the values have in the set of all values. The resulting sequence is a uniformly drawn random permutation (because all permutations are possible and none is more likely than any other). The position of the maximum in a random permutation is uniformly distributed. Not sure about a rigorous proof though – Niklas B. Apr 28 '14 at 22:36
  • If no random generator is given, is it possible to achieve? – Jackson Tale Apr 29 '14 at 10:00
  • @Jackson you mean if you just have a bit generator? amit proposed a solution that only requires random integers, which you can implement using random bits – Niklas B. Apr 29 '14 at 14:16
  • @JacksonTale Then you can't make a random decision. Isn't that obvious? I mean you can implement your PRNG, if that's what you're asking – Niklas B. Apr 29 '14 at 14:57
  • @NiklasB. `Imagine the sequence of random values drawn after you have processed all elements. They are all distinct.` Why are they all distinct? What if I get .5 and .5 twice? Similarly, does this work with a bounded data type, e.g. if a double is limited to 64-bit, then what if two numbers receive the same maximum/minimum? Is there a fair way to distinguish between the two? – ChaimKut May 18 '14 at 15:47
  • @ChaimKut Well let's say the probability that two are equal is zero. It's not quite the same thing as "can't happen", in particular with real-world RNGs. But as long as you don't exceed the period of the generator, I guess you should be fine (as in, the probability of this happening is low enough to not influence the expected outcome in a significant way). If you want a solution that does not require random floats, see amit's answer – Niklas B. May 18 '14 at 15:58
0

This solution doesn't quite fit all of the parameters in the question. However, this solution is based on a real-world need and code.

private static final SecureRandom s_random = new SecureRandom();  // Use SecureRandom for truly random selection without a pattern

public static <V> V randomValue(Iterator<V> values)
{
   V result, item;
   int count;

   result = null;

   for (count = 1, values.hasNext(); count++)
   {
      item = values.next();

      if (count == 1)  // Always select the first element
         result = item;
      else if (s_random.nextnextInt(count) == 0)  // Replace the previous random item with a new item with 1/count probability
         result = item;
   }

   if (result == null)
      throw new IllegalArgumentException("No value found");

   return(result);
}

The above algorithm is taken from here

Nathan
  • 8,093
  • 8
  • 50
  • 76
0

Here's my python code for this problem (also available with tests on my github):

import random as rnd

def sample(iterator):
    """ Uniform sampling of an element from an stream.
    The i-th element (indexed at 1) is sampled with probability 1/i.
    This can be shown by recurrence to lead to uniform sampling. 
    Indeed, at step n we impose p(val==n)=1/n, and all elements i before n 
    were sampled with equal probability, which is also equal to 1/n since
    p(val<n) = (n-1)*p(val==i) = 1-1/n = (n-1)/n => p(val==i) = 1/n """
    i = 1
    out = None
    for elm in iterator:
        if rnd.uniform(0,1) <= 1/i:
            out = elm
        i += 1
    return out
James Baye
  • 43
  • 5