My code spends 40% of its time searching through unsorted vectors. More specifically, the searching function my_search
repeatedly receives a single unsorted vector of length N
, where N
can take any values between 10 and 100,000. The weights associated with each element have relatively little variance (e.g. [ 0.8, 0.81, 0.85, 0.78, 0.8, 0.7, 0.84, 0.82, ...]).
The algorithm my_search
starts by summing all the weights for each object and then sample an average of N
elements (as many as the length of the vector) with replacements. The algorithm is quite similar to
int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
from this post.
I could sort the vector before searching but takes a time of the order of O(N log N) (depending on the sort algorithm used) and I doubt (but might be wrong as I haven't tried) that I would gain much time especially as the weights have little variance.
Another solution would be to store information of how much weight there is before a series of points. For example, while summing the vector, every N/10 elements, I could store the information of how much weights has been summed yet. Then, I could first compare rnd
to these 10 breakpoints and search in only a tenth of the total length of the vector.
- Would this be a good solution?
- Is there a name for the process I described?
- How can I estimate what is the right number of breakpoints to store as a function of
N
? - Is there a better solution?