Questions tagged [reservoir-sampling]

17 questions
38
votes
4 answers

Reservoir sampling

To retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?
Jony
  • 6,694
  • 20
  • 61
  • 71
8
votes
3 answers

re-using random number in reservoir sampling

It was asked in relation to another question recently: Given an unknown length list, return a random item in it by scanning it only 1 time I know you shouldn't, I just can't put my finger on a canonical explanation of why not. Look at the example…
Will
  • 73,905
  • 40
  • 169
  • 246
8
votes
2 answers

Iterative or Lazy Reservoir Sampling

I'm fairly well acquainted with using Reservoir Sampling to sample from a set of undetermined length in a single pass over the data. One limitation of this approach, in my mind, is that it still requires a pass over the entire data set before any…
Stankalank
  • 1,507
  • 1
  • 12
  • 12
4
votes
2 answers

Initialization of Weighted Reservoir Sampling (A-Chao implementation)

I am trying to implement A-Chao version of weighted reservoir sampling as shown in https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_A-Chao But I found that the pseudo-code described in wiki seems to be wrong, especially on the…
zzz
  • 975
  • 8
  • 14
2
votes
1 answer

Infinite/Lazy Reservoir Sampling in Haskell

I tried to implement a simple reservoir sampling in haskell following http://jeremykun.com/2013/07/05/reservoir-sampling/ (note that the algorithm shown is possibly semantically incorrect) According to this: Iterative or Lazy Reservoir Sampling lazy…
CMCDragonkai
  • 6,222
  • 12
  • 56
  • 98
1
vote
1 answer

constant memory reservoir sampling, O(k) possible?

I have an input stream, of size n, and I want to produce an output stream of size k that contains distinct random elements of the input stream, without requiring any additional memory for elements selected by the sample. The algorithm I was going to…
markt1964
  • 2,638
  • 2
  • 22
  • 54
1
vote
1 answer

reservoir sampling understanding probaility

I am having trouble understanding the probabilities involved in reservoir sampling. Below is the sample code which I have seen used in almost everywhere: 1/* 2 S has items to sample, R will contain the result, K number of items to select …
1
vote
2 answers

Reservoir Sampling unable to understand probability

To make clear following is the question: Given an input stream of an indeterminate length, how do you return a random member of that stream (with equal probability for each), given that you're not allowed to store more than a constant number of…
1
vote
1 answer

Reservoir Sampling on large Streams

I am trying to implement a reservoir sampling algorithm using java. I have N streams of data ( readings from sensors arriving at a sink node ) of unknown size. For the sake of simplicity lets assume i have one stream of unknown size. So what one of…
Silas
  • 125
  • 2
  • 11
1
vote
1 answer

Test Case for Weighted Reservoir Sampling

I need to implement Weighted Reservoir Sampling. I have referred to the paper mentioned in this blog. I want to write test-cases for unit testing my implementation and am confused as to how to calculate expected-probability of different element to…
1
vote
3 answers

reservoir sampling problem

This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability…
Lazer
  • 90,700
  • 113
  • 281
  • 364
1
vote
2 answers

Reservoir Sampling Algorithm

I want to understand the reservoir sampling algorithm where we select k elements out of the given set of S elements such that k <= S. In the algorithm given on wiki: array R[k]; // result integer i, j; // fill the reservoir array for each i in…
ASingh
  • 475
  • 1
  • 13
  • 24
0
votes
0 answers

How to implement reservoir sampling in postgresql c language by uers-defined aggregate

I'd like to create a user-defined aggregate to implement reservoir sampling. Here are my codes. I want to use the bytea to store the address of the struct pointer which stores the state of the reservoir sampling. However, there are some bugs. For…
Leo
  • 1
  • 1
0
votes
0 answers

Reservoir Sampling, except I can't hold the sample in memory

Is there an algorithm such as Reservoir Sampling (algorithm that randomly chooses an item from a stream such that each item is equally likely to be selected), but once an item is selected it is yielded (and therefore cannot be overridden)? i.e, not…
0
votes
1 answer

Reservoir Sampling algorithm not wroking

I have a project for my data mining class, in which I have to code the reservoir sampling algorithm for files. The program takes as input a number k, the name of the input file, and the name of the output file to be created. The output file must…
GGre
  • 3
  • 1
1
2