I have dictionary of about 20,000 objects The key is a string representation of the object, and the value is the object itself. Each object has attributes self.length
and self.rate
. self.rate
is calculated as 1.5E-8*self.length
.
I need to select, with replacement, a pre-determined number (we'll say 500 for this example) of items from this dict based on their rate. objects with a lower rate will be less likely to be chosen and objects with a higher rate, more likely.
The way I thought I could do this is very slow.
In a while loop, while the number of selected objects is less than the number of required selections, I generate a random number between 0, and the length of the dict and choose that element. Then I generate another random number and if the random number is less than the rate
of the chosen object in the list, that gets added to the selected objects. This seemed fine at first but now I am realising it's much too slow. Does anyone have suggestions on how to do this faster?
Some code: The class definition for the object
from numpy import random
class object():
def __init__(self, length):
self.length = length
self.rate = (1.15E-8*self.length)
def select(self):
x = random.uniform(0,1)
if(x<self.rate):
return True
else:
return False
And the function (in another module) that does the rest:
def select_random(object_dict,maxselect):
nselect = 0
object_names = object_dict.keys()
selected_objects = []
while(nselect < maxselect):
x = random.randint(0,len(object_dict))
if(object_dict[object_names[x]].select()):
nselect +=1
selected_objects.append(object_names[x])
return(selected_objects)
I think what is making it really slow is that probability of each object being chosen is so small that there needs to many iterations before even one object is chosen let alone 500 or possibly more.
Distribution of lengths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
51 822 1311 1770 2112 103000