Is there an algorithm such as Reservoir Sampling (algorithm that randomly chooses an item from a stream such that each item is equally likely to be selected), but once an item is selected it is yielded (and therefore cannot be overridden)? i.e, not only can't I hold the stream in memory, I can't hold the sample in memory either. I need a Python solution, but an algorithm name is enough.
The code I currently have (taken from https://towardsdatascience.com/the-5-sampling-algorithms-every-data-scientist-need-to-know-43c7bc11d17c):
k=5
reservoir = []
for i, element in enumerate(stream):
if i+1<= k:
reservoir.append(element)
else:
probability = k/(i+1)
if random.random() < probability:
# Select item in stream and remove one of the k items already selected
reservoir[random.choice(range(0,k))] = element # This is why I can't currently do it - elements in the list are overridden