I would like to discuss how to optimally implement the following:
Given a set of elements,
all of them are distinct. Call this input set N
, it has size n
. These element each have a weight. Now, you need to create a subset (R
) of this N
of size r
, where r
is smaller then or equal to n
.
This subset, again, cannot contain repetitions. Every element of R
needs to be randomly chosen from N
, if an element has a higher weight, the chance of it being chosen needs to be higher. If the total weight of all element in N
is W
, then the chance that element i
is chosen needs to be w_i/W
.
One last detail is that the weights are prone to change, but only to increase, they cannot decrease.
I want a neat implementation for this. I am working in Java, but I am hoping to find some language agnostic properties or details (or even pseudo code) that are interesting.
Now, for my own solution: I create an Array List
of the original set of elements. I make sure that the weights are natural numbers and I add them n
times if the weight of the element is n
. Then I shuffle the arraylist (collections.shuffle
) and I keep on taking element from the shuffled list and adding them to a Java Set
until the size of the set is r
. If the weight of an element increases, it gets added some more times to the original array list. Shuffle again, make a new subset.
My implementation needs loads of memory and the shuffle thingy is also slower if the set becomes bigger. Is there a better idea out there?