I have the following python method, which selects a weighted random element from the sequence "seq" randomly weighted by other sequence, which contains the weights for each element in seq:
def weighted_choice(seq, weights):
assert len(seq) == len(weights)
total = sum(weights)
r = random.uniform(0, total)
upto = 0
for i in range(len(seq)):
if upto + weights[i] >= r:
return seq[i]
upto += weights[i]
assert False, "Shouldn't get here"
If I call the above a million times with a 1000 element sequence, like this:
seq = range(1000)
weights = []
for i in range(1000):
weights.append(random.randint(1,100))
st=time.time()
for i in range(1000000):
r=weighted_choice(seq, weights)
print (time.time()-st)
it runs for approximately 45 seconds in cpython 2.7 and for 70 seconds in cpython 3.6. It finishes in around 2.3 seconds in pypy 5.10, which would be fine for me, sadly I can't use pypy for some reasons.
Any ideas on how to speed up this function on cpython? I'm interested in other implementations (algorithmically, or via external libraries, like numpy) as well if they perform better.
ps: python3 has random.choices with weights, it runs for around 23 seconds, which is better than the above function, but still exactly ten times slower than pypy can run.
I've tried it with numpy this way:
weights=[1./1000]*1000
st=time.time()
for i in range(1000000):
#r=weighted_choice(seq, weights)
#r=random.choices(seq, weights)
r=numpy.random.choice(seq, p=weights)
print (time.time()-st)
It ran for 70 seconds.