Here is an efficient method of achieving this. This method uses binary searching (although it is modified to suit your needs).
Here is a summary of how it works:
- You represent the probabilites of certain elements getting picked in an array. So if you have the probabilities 50% for "A", 20% for "B", 10% for C, 5% for D, 5% for E, 0.1% for F and 9.9% for G, this would be
[.5, .2, .1, .05, .05, .001, .099]
in an array. However this is no good because we cannot use it in binary searching as it is not sorted - but if we sort it the probabilities will no longer correspond to our letter-array ([A,B,C,D,E,F,G]
). So instead we need to add up each of the probabilities until we end up with 1. Now the probability-array looks like so: [.5, .7, .8, .85, .9, .901, 1]
. Now it is sorted, and still corresponds to the array of letters above.
- Now we create a random fractional number between 0 and the highest value in the probability array.
Math.random()
is perfect for that.
- Now we look to see which value in the array of probabilities array is closest to this fraction. But there is one catch - the "closest" value cannot be smaller than the fraction.
- Once we have the index of this "closest" value, we can use the same index to pick a value from the letters-array. Here is an example in JavaScript:
function find(arr, x , start=0, end=arr.length) {
if(end < start) return -1;
else if(end == start) return end;
const mid = Math.floor((start + end) / 2);
if(arr[mid] === x) return mid+1;
else if(arr[mid] < x) return find(arr, x, mid+1, end);
else
return find(arr, x, start, mid);
};
const table_of_corresponding_probabilities = [.5,.7,.8,.85,.9,.901,1];
const values_to_pick_from = ["A", "B", "C", "D", "E", "F", "G"];
function weighted_random_pick(items, weights) {
return items[find(weights, Math.random())];
};
console.log(weighted_random_pick(values_to_pick_from, table_of_corresponding_probabilities));
So, with these probabilities we should get As 50% of the time, and other letters the rest of the time. Here is a test testing the randomness of the algorithm above:
function find(arr, x , start=0, end=arr.length) {
if(end < start) return -1;
else if(end == start) return end;
const mid = Math.floor((start + end) / 2);
if(arr[mid] === x) return mid+1;
else if(arr[mid] < x) return find(arr, x, mid+1, end);
else
return find(arr, x, start, mid);
};
const prob = [.5,.7,.8,.85,.9,.901,1];
const vals = ["A", "B", "C", "D", "E", "F", "G"];
const results = {A:0, B:0, C:0, D:0, E:0, F:0, G:0};
const times_it_ran = 160000;
for(let i = 0; i<times_it_ran; i++) {
results[vals[find(prob, Math.random())]]++
};
for(letter in results) {
console.log(letter+":",(results[letter]/(times_it_ran/100)).toFixed(3),"%");
};
When you run the above snippet you should find the percentage of times each letter was picked was close to the intended probability of that letter being picked. Of course it will never be absolutely equal because after all it is random (or at least pseudo-random).
Ok, what about speed and efficiency? Let's test that too:
function find(arr, x , start=0, end=arr.length) {
if(end < start) return -1;
else if(end == start) return end;
const mid = Math.floor((start + end) / 2);
if(arr[mid] === x) return mid+1;
else if(arr[mid] < x) return find(arr, x, mid+1, end);
else
return find(arr, x, start, mid);
};
const array_length = 330000;
const probs = Array.apply(null, {length: array_length}).map((x,i) => (i??0)/(array_length-1)); // Note: this way of creating an array means that each value has an equal chance of getting picked but the array is still very long;
const vals = Array.apply(null, {length: array_length}).map(Function.call, String);
const time = func => {
console.time("timer");
func();
console.timeEnd("timer");
};
// Now time the time it takes to search within this LONG array:
function button_click() {
var x = time(() => {
vals[find(probs, Math.random())];
});
};
<button onclick="button_click();">Run test</button>
As you can see, the tests are pretty fast. Mine averaged at about 2ms. However this only searches in an array of length 3.3e5
. This is the value I have chosen because otherwise I get a range error (limitation of built-in function Array.apply
). So here I have done the same test, but used a different method of generating massive arrays (a for loop... i know it's probably the worst method but it does the job).
function find(arr, x , start=0, end=arr.length) {
if(end < start) return -1;
else if(end == start) return end;
const mid = Math.floor((start + end) / 2);
if(arr[mid] === x) return mid+1;
else if(arr[mid] < x) return find(arr, x, mid+1, end);
else
return find(arr, x, start, mid);
};
const len = 75e6; // 75 million elements in this array!
let probs = [];
for(let i = 0; i < 1; i+=(1/len)) {
probs.push(i);
};
const time = func => {
console.time("timer");
func();
console.timeEnd("timer");
};
// Now time the time it takes to search within this LONG array:
function button_click() {
var x = time(() => {
find(probs, Math.random());
});
};
<button onclick="button_click();">Run test</button>
So after running this test with 75 million elements, what do we find?
The first test is marginally slower than the tests we previously ran (with 3.3e5 elements), and the rest average at about 2ms to 2.25ms . So this is (2+2.25)/2 - avg time from last tests = 2.125-2 = 0.125
0.125ms slower than searching with an array with 227
TIMES less elements. That is the extent to which binary search is efficient. Actually I'd like to suggest that a part of that 0.125ms delay could be due to the fact that the CPU cores are very hot due to that bad method of building an array. Yes, I'm talking about the 75 million iterations we had to complete just to create that array!
Hope you found the efficiency helpful! If you would like to use this algorithm just use the first snippet I gave you, everything is much more readable there than in the last few snippets.