1

I want to understand the reservoir sampling algorithm where we select k elements out of the given set of S elements such that k <= S.

In the algorithm given on wiki:

array R[k];    // result
integer i, j;

 // fill the reservoir array

for each i in 1 to k do
    R[i] := S[i]
done;

// replace elements with gradually decreasing probability

for each i in k+1 to length(S) do
    j := random(1, i);   // important: inclusive range
    if j <= k then
        R[j] := S[i]
    fi
done

If I understand this correctly, we first select k elements from the set and then continuously parse i elements of S, generate the random no j in the range 1 to i and replace the element j with S[i].

It looks fine if the set K to be sampled is very large, but if I want to pick just 1 element from a linked list of infinite size(at least unknown size) at random, how will I do it with this algorithm...?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
ASingh
  • 475
  • 1
  • 13
  • 24

2 Answers2

3

The reservoir sampling algorithm works on any sized linked list, even one whose length is unknown in advance. In fact, one of the main selling points of reservoir sampling is that it works on data streams whose size is not known in advance.

If you set k = 1 and then run the normal reservoir sampling algorithm, then you should correctly get a uniformly random element from the list.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Thanks for the comment but it doesnt helps...Can u explain how..? – ASingh Jan 20 '13 at 21:47
  • @ASingh- Just run the normal reservoir sampling algorithm on your linked list with k set to 1. Nothing in the reservoir sampling algorithm requires you to know the length of the list in advance (the for loop you have written that loops up to `length(S)` can easily be replaced by a while loop that loops while there are unvisited elements), so you can just run the algorithm over the linked list of unknown length. Does that make sense? – templatetypedef Jan 20 '13 at 21:48
  • Yes, it makes sense now...Thanks – ASingh Jan 20 '13 at 21:51
-1

I have implemented A different algorithm to solve this problem, here is my code

static char[] solution2(String stream, int K) {
    HashSet<Integer> set = new HashSet();
    char[] list = new char[K];
    stream = stream.concat(stream2);
    Random ran = new Random();
    for (int i = 0; i < K; i++) {
        int y = ran.nextInt(stream.length());
        if (set.add(y)) {
            list[i] = stream.charAt(y);
        } else {
            i--; //skip this iteration since its duplicate number
        }
    }
    return list;
}

Instead of iterating over all the stream values, just pick a random values J and get N[J] from the stream.

m4c_4rthur
  • 53
  • 7