I need a function to do the following in Python.
I need to iterate through an input list and choose each element with probability p (which is small).
Here is a naive implementation to make clear what I'm doing.
def f(inputList,p):
for element in inputList:
if random.random()<p:
yield element
Each random number generation is expensive. We can do fewer random number generations by calculating initially how long it will take before the first success and then jumping to that entry. I have a method, but wonder if there is something already in existence or a better way to code this. In principle I just need it for lists, but would like something that works on a general iterable.
def jump_calculation(p):
if p == 1:
return 0
r = random.random()
k = int(scipy.log(1-r)/scipy.log(1-p))
return k
def binomial_choice(L,p):
jump = jump_calculation(p)
for element in L:
jump -= 1
if jump<0:
yield element
jump = jump_calculation(p)
Am I reinventing the wheel? If not, are the obvious improvements to make the code easier to read?