I am having trouble understanding the probabilities involved in reservoir sampling. Below is the sample code which I have seen used in almost everywhere:
1/*
2 S has items to sample, R will contain the result, K number of items to select
3*/
4ReservoirSample(S[1..n], R[1..k])
5 // fill the reservoir array
6 for i = 1 to k
7 R[i] := S[i]
8
9 // replace elements with gradually decreasing probability
10 for i = k+1 to n
11 j := random(1, i) // important: inclusive range
12 if j <= k
13 R[j] := S[i]
Is my understanding right(?): Suppose we have k=3 and input = [100, 200, 300, 400, 500] and i is currently at 500 index. Probability of 500 replacing 300 in the reservoir(which is 3 in size) = probability of 300 being selected in the reservoir * probability of 500 being selected which is only possible if the index returned by random function is less than or equal to 3 out of 5 choices = 1/3 * 3/5 = 1/5