1. Repeted values
Let's suppose we have an array in which every value corresponds to the relative probability of its index. For example, given a coin, the possible outcomes of a toss are 50% tails and 50% heads. We can represent those probability with an array, like (I'll use PHP as this seems the language used by OP):
$coin = array(
'head' => 1,
'tails' => 1
);
While the results of rolling two dice can be represented as:
$dice = array( '2' => 1, '3' => 2, '4' => 3, '5' => 4, '6' => 5, '7' => 6,
'8' => 5, '9' => 4, '10' => 3, '11' => 2, '12' => 1
);
An easy way to pick a random key (index) with a probability proportional to the values of those arrays (and therefore consistent to the underlying model) is to create another array whose elements are the keys of the original one repeated as many times as indicated by the values and then return a random value. For example for the dice
array:
$arr = array( 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, ...
Doing so, we are confident that each key will be picked up with the right relative probability. We can encapsulate all the logic in a class with a constructer which builds the helper array an a function that returns a random index using mt_rand():
class RandomKeyMultiple {
private $pool = array();
private $max_range;
function __construct( $source ) {
// build the look-up array
foreach ( $source as $key => $value ) {
for ( $i = 0; $i < $value; $i++ ) {
$this->pool[] = $key;
}
}
$this->max_range = count($this->pool) - 1;
}
function get_random_key() {
$x = mt_rand(0, $this->max_range);
return $this->pool[$x];
}
}
The usage is simple, just create an object of the class passing the source array and then each call of the function will return a random key:
$test = new RandomKeyMultiple($dice);
echo $test->get_random_key();
The problem is that OP's array contains big values and this results in a very big (but still manageable, even without dividing all the values by 100) array.
2. Steps
In general, discrete probability distribution may be more complicated, with float values that cannot be easily translated in number of repetitions.
Another way to solve the problem is to consider the values in the array as the misures of intervals that divide the global range of all possible values:
+---------------------------+-----------------+-------+----+
| | | | |
|<--- 265000 --->|<-- 190000 -->|<30000>|1300|
|<------- 455000 ------>| |
|<---------- 485000 --------->| |
|<---------------- 486300 -------------->|
Then we can choose a random number between 0 and 486300 (the global range) and look up the right index (the odds of which would be proportional to the lenght of its segment, giving the correct probability distribution). Something like:
$x = mt_rand(0, 486300);
if ( $x < 265000 )
return 0;
elseif ( $x < 455000 )
return 1;
elseif ( $x < 485000 )
return 2;
else
return 3;
We can generalize the algorithm and encapsulate all the logic in a class (using an helper array to store the partial sums):
class RandomKey {
private $steps = array();
private $last_key;
private $max_range;
function __construct( $source ) {
// sort in ascending order to partially avoid numerical issues
asort($source);
// calculate the partial sums. Considering OP's array:
//
// 1300 ----> 0
// 30000 ----> 1300
// 190000 ----> 31300
// 265000 ----> 221300 endind with $partial = 486300
//
$partial = 0;
$temp = 0;
foreach ( $source as $k => &$v ) {
$temp = $v;
$v = $partial;
$partial += $temp;
}
// scale the steps to cover the entire mt_rand() range
$factor = mt_getrandmax() / $partial;
foreach ( $source as $k => &$v ) {
$v *= $factor;
}
// Having the most probably outcomes first, minimizes the look-up of
// the correct index
$this->steps = array_reverse($source);
// remove last element (don't needed during checks) but save the key
end($this->steps);
$this->last_key = key($this->steps);
array_pop($this->steps);
}
function get_random_key() {
$x = mt_rand();
foreach ( $this->steps as $key => $value ) {
if ( $x > $value ) {
return $key;
}
}
return $this->last_key;
}
}
Here or here there are live demos with some examples and helper functions to check the probability distribution of the keys.
For bigger arrays, a binary search to look-up the index may also be considered.