67

I am wondering what would be the best way (e.g. in Java) to generate random numbers within a particular range where each number has a certain probability to occur or not?

e.g.

Generate random integers from within [1;3] with the following probabilities:

P(1) = 0.2
P(2) = 0.3
P(3) = 0.5


Right now I am considering the approach to generate a random integer within [0;100] and do the following:

If it is within [0;20] --> I got my random number 1.
If it is within [21;50] --> I got my random number 2.
If it is within [51;100] --> I got my random number 3.

What would you say?

Daniel Smith
  • 8,561
  • 3
  • 35
  • 58
marc wellman
  • 5,808
  • 5
  • 32
  • 59
  • 9
    I think it's a clever way to do it like that, but I don't know if there is anything "better". Just make sure you go from 0 to 99, as otherwise you will end up with 101 numbers and not exactly the percentage you want. – Blub Dec 02 '13 at 12:10
  • 3
    yes, this seems reasonable, otherwise you could use [EnumeratedIntegerDistribution](https://commons.apache.org/proper/commons-math/javadocs/api-3.2/org/apache/commons/math3/distribution/EnumeratedIntegerDistribution.html), example shown [here](http://stackoverflow.com/a/16436249/2358786) – kiruwka Dec 02 '13 at 12:32
  • 1
    Granted, I could not find a relevant implementation for your problem in [SSJ](http://www.iro.umontreal.ca/~simardr/ssj-2/indexe.html), but you should give it a more thorough look than I... – Yaneeve Dec 02 '13 at 12:40

12 Answers12

44

Yours is a pretty good way already and works well with any range.

Just thinking: another possibility is to get rid of the fractions by multiplying with a constant multiplier, and then build an array with the size of this multiplier. Multiplying by 10 you get

P(1) = 2
P(2) = 3
P(3) = 5

Then you create an array with the inverse values -- '1' goes into elements 1 and 2, '2' into 3 to 6, and so on:

P = (1,1, 2,2,2, 3,3,3,3,3);

and then you can pick a random element from this array instead.


(Add.) Using the probabilities from the example in kiruwka's comment:

int[] numsToGenerate           = new int[]    { 1,   2,    3,   4,    5   };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };

the smallest multiplier that leads to all-integers is 20, which gives you

2, 5, 6, 5, 2

and so the length of numsToGenerate would be 20, with the following values:

1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5

The distribution is exactly the same: the chance of '1', for example, is now 2 out of 20 -- still 0.1.

This is based on your original probabilities all adding up to 1. If they do not, multiply the total by this same factor (which is then going to be your array length as well).

Jongware
  • 22,200
  • 8
  • 54
  • 100
41

Some time ago I wrote a helper class to solve this issue. The source code should show the concept clear enough:

public class DistributedRandomNumberGenerator {

    private Map<Integer, Double> distribution;
    private double distSum;

    public DistributedRandomNumberGenerator() {
        distribution = new HashMap<>();
    }

    public void addNumber(int value, double distribution) {
        if (this.distribution.get(value) != null) {
            distSum -= this.distribution.get(value);
        }
        this.distribution.put(value, distribution);
        distSum += distribution;
    }

    public int getDistributedRandomNumber() {
        double rand = Math.random();
        double ratio = 1.0f / distSum;
        double tempDist = 0;
        for (Integer i : distribution.keySet()) {
            tempDist += distribution.get(i);
            if (rand / ratio <= tempDist) {
                return i;
            }
        }
        return 0;
    }

}

The usage of the class is as follows:

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.3d); // Adds the numerical value 1 with a probability of 0.3 (30%)
// [...] Add more values

int random = drng.getDistributedRandomNumber(); // Generate a random number

Test driver to verify functionality:

    public static void main(String[] args) {
        DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
        drng.addNumber(1, 0.2d);
        drng.addNumber(2, 0.3d);
        drng.addNumber(3, 0.5d);

        int testCount = 1000000;

        HashMap<Integer, Double> test = new HashMap<>();

        for (int i = 0; i < testCount; i++) {
            int random = drng.getDistributedRandomNumber();
            test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);
        }

        System.out.println(test.toString());
    }

Sample output for this test driver:

{1=0.20019100000017953, 2=0.2999349999988933, 3=0.4998739999935438}
trylimits
  • 2,575
  • 1
  • 22
  • 32
  • I like that! If you want to use it on a large scale tho the hashmap should use `Float` instead of `Double` to reduce unneccessary overhead – xeruf Oct 28 '17 at 13:36
  • Could you kindly explain the for-loop in the `main()`? I don't understand what it is doing. Also, why are you not checking the `distSum` to be `1` before doing the calculation? – user366312 Apr 01 '20 at 13:04
  • What are you doing with this: `if (this.distribution.get(value) != null) { distSum -= this.distribution.get(value); }` ? – user366312 Apr 01 '20 at 13:17
  • @user366312 If `addNumber(int value, ...)` is called multiple times with the same `value` this line ensures that the sum `distSum` holds the correct value. – trylimits Apr 01 '20 at 14:19
  • Why `test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);` needed? What does `1d / testCount` achieve? And can you please explain what is the logic behind of this code, what is it named if I want to search about it? (like inverse cumulative distribution, etc.?) I couldn't get how is it serving its job.. – noobie Mar 21 '22 at 20:53
  • 1
    @noobie The term `(1d / testCount)` is used for calculating the average of the test driver. A different, but probably more understandable way of doing this, would be to count each random number and divide it by `testcount`. I don't know if this algorithm has a dedicated name. I implemented this class to use it as [Roulette Wheel Selection](https://en.wikipedia.org/wiki/Fitness_proportionate_selection) - probably that's the name you are looking for. – trylimits Mar 22 '22 at 13:06
10

You already wrote the implementation in your question. ;)

final int ran = myRandom.nextInt(100);
if (ran > 50) { return 3; }
else if (ran > 20) { return 2; } 
else { return 1; }

You can speed this up for more complex implementations by per-calculating the result on a switch table like this:

t[0] = 1; t[1] = 1; // ... one for each possible result
return t[ran];

But this should only be used if this is a performance bottleneck and called several hundred times per second.

TwoThe
  • 13,879
  • 6
  • 30
  • 54
5

If you have performance issue instead of searching all the n values O(n)

you could perform binary search which costs O(log n)

Random r=new Random();      
double[] weights=new double[]{0.1,0.1+0.2,0.1+0.2+0.5};
// end of init
double random=r.nextDouble();
// next perform the binary search in weights array

you only need to access log2(weights.length) in average if you have a lot of weights elements.

Sergei Chicherin
  • 2,031
  • 1
  • 18
  • 24
4

Your approach is fine for the specific numbers you picked, although you could reduce storage by using an array of 10 instead of an array of 100. However, this approach doesn't generalize well to large numbers of outcomes or outcomes with probabilities such as 1/e or 1/PI.

A potentially better solution is to use an alias table. The alias method takes O(n) work to set up the table for n outcomes, but then is constant time to generate regardless of how many outcomes there are.

Community
  • 1
  • 1
pjs
  • 18,696
  • 4
  • 27
  • 56
1

Try this: In this example i use an array of chars, but you can substitute it with your integer array.

Weight list contains for each char the associated probability. It represent the probability distribution of my charset.

In weightsum list for each char i stored his actual probability plus the sum of any antecedent probability.

For example in weightsum the third element corresponding to 'C', is 65:
P('A') + P('B) + P('C') = P(X=>c)
10 + 20 + 25 = 65

So weightsum represent the cumulative distribution of my charset. weightsum contains the following values:

It's easy to see that the 8th element correspondig to H, have a larger gap (80 of course like his probability) then is more like to happen!

        List<Character> charset =   Arrays.asList('A','B','C','D','E','F','G','H','I','J');
        List<Integer> weight = Arrays.asList(10,30,25,60,20,70,10,80,20,30);
        List<Integer>  weightsum = new ArrayList<>();

        int i=0,j=0,k=0;
        Random Rnd = new Random();

        weightsum.add(weight.get(0));

        for (i = 1; i < 10; i++)
            weightsum.add(weightsum.get(i-1) + weight.get(i));

Then i use a cycle to get 30 random char extractions from charset,each one drawned accordingly to the cumulative probability.

In k i stored a random number from 0 to the max value allocated in weightsum. Then i look up in weightsum for a number grather than k, the position of the number in weightsum correspond to the same position of the char in charset.

   for (j = 0; j < 30; j++)
   {
   Random r = new Random();
   k =   r.nextInt(weightsum.get(weightsum.size()-1));

   for (i = 0; k > weightsum.get(i); i++) ;
   System.out.print(charset.get(i));
   }

The code give out that sequence of char:

HHFAIIDFBDDDHFICJHACCDFJBGBHHB

Let's do the math!

A = 2
B = 4
C = 3
D = 5
E = 0
F = 4
G = 1
H = 6
I = 3
J = 2

Total.:30
As we wish D and H are have more occurances (70% and 80% prob.)
Otherwinse E didn't come out at all. (10% prob.)

1

there is one more effective way rather than getting into fractions or creating big arrays or hard coding range to 100

in your case array becomes int[]{2,3,5} sum = 10 just take sum of all the probablity run random number generator on it result = New Random().nextInt(10)

iterate over array elements from index 0 and calculate sum and return when sum is greater than return element of that index as a output

i.e if result is 6 then it will return index 2 which is no 5

this solution will scale irrespective of having big numbers or size of the range

Parth
  • 188
  • 1
  • 1
  • 9
0

referencing the paper pointed by pjs in another post , the population of base64 table can be further optimized. The result is amazingly fast, initialization is slightly expensive, but if the probabilities are not changing often, this is a good approach.

*For duplicate key, the last probability is taken instead of being combined (slightly different from EnumeratedIntegerDistribution behaviour)

public class RandomGen5 extends BaseRandomGen {

    private int[] t_array = new int[4];
    private int sumOfNumerator;
    private final static int DENOM = (int) Math.pow(2, 24);
    private static final int[] bitCount = new int[] {18, 12, 6, 0};
    private static final int[] cumPow64 = new int[] {
            (int) ( Math.pow( 64, 3 ) + Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 0 ) )
    };


    ArrayList[] base64Table = {new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()};

    public int nextNum() {
        int rand = (int) (randGen.nextFloat() * sumOfNumerator);

        for ( int x = 0 ; x < 4 ; x ++ ) {
                if (rand < t_array[x])
                    return x == 0 ? (int) base64Table[x].get(rand >> bitCount[x])
                            : (int) base64Table[x].get( ( rand - t_array[x-1] ) >> bitCount[x]) ;
        }
        return 0;
    }

    public void setIntProbList( int[] intList, float[] probList ) {
        Map<Integer, Float> map = normalizeMap( intList, probList );
        populateBase64Table( map );
    }

    private void clearBase64Table() {
        for ( int x = 0 ; x < 4 ; x++ ) {
            base64Table[x].clear();
        }
    }

    private void populateBase64Table( Map<Integer, Float> intProbMap ) {
        int startPow, decodedFreq, table_index;
        float rem;

        clearBase64Table();

        for ( Map.Entry<Integer, Float> numObj : intProbMap.entrySet() ) {
            rem = numObj.getValue();
            table_index = 3;
            for ( int x = 0 ; x < 4 ; x++ ) {
                decodedFreq = (int) (rem % 64);
                rem /= 64;
                for ( int y = 0 ; y < decodedFreq ; y ++ ) {
                    base64Table[table_index].add( numObj.getKey() );
                }
                table_index--;
            }
        }

        startPow = 3;
        for ( int x = 0 ; x < 4 ; x++ ) {
            t_array[x] = x == 0 ? (int) ( Math.pow( 64, startPow-- ) * base64Table[x].size() )
                    : ( (int) ( ( Math.pow( 64, startPow-- ) * base64Table[x].size() ) + t_array[x-1] ) );
        }

    }

    private Map<Integer, Float> normalizeMap( int[] intList, float[] probList ) {
        Map<Integer, Float> tmpMap = new HashMap<>();
        Float mappedFloat;
        int numerator;
        float normalizedProb, distSum = 0;

        //Remove duplicates, and calculate the sum of non-repeated keys
        for ( int x = 0 ; x < probList.length ; x++ ) {
            mappedFloat = tmpMap.get( intList[x] );
            if ( mappedFloat != null ) {
                distSum -= mappedFloat;
            } else {
                distSum += probList[x];
            }
            tmpMap.put( intList[x], probList[x] );
        }

        //Normalise the map to key -> corresponding numerator by multiplying with 2^24
        sumOfNumerator = 0;
        for ( Map.Entry<Integer, Float> intProb : tmpMap.entrySet() ) {
            normalizedProb = intProb.getValue() / distSum;
            numerator = (int) ( normalizedProb * DENOM );
            intProb.setValue( (float) numerator );
            sumOfNumerator += numerator;
        }

        return tmpMap;
    }
}
E.R.Tan
  • 11
  • 2
0

If you are not against adding a new library in your code, this feature is already implemented in MockNeat, check the probabilities() method.

Some examples directly from the wiki:

String s = mockNeat.probabilites(String.class)
                .add(0.1, "A") // 10% chance
                .add(0.2, "B") // 20% chance
                .add(0.5, "C") // 50% chance
                .add(0.2, "D") // 20% chance
                .val();

Or if you want to generate numbers within given ranges with a given probability you can do something like:

Integer x = m.probabilites(Integer.class)
             .add(0.2, m.ints().range(0, 100))
             .add(0.5, m.ints().range(100, 200))
             .add(0.3, m.ints().range(200, 300))
             .val();

Disclaimer: I am the author of the library, so I might be biased when I am recommending it.

Andrei Ciobanu
  • 12,500
  • 24
  • 85
  • 118
0

Here is the python code even though you ask for java, but it's very similar.

# weighted probability

theta = np.array([0.1,0.25,0.6,0.05])
print(theta)

sample_axis = np.hstack((np.zeros(1), np.cumsum(theta))) 
print(sample_axis)

[0. 0.1 0.35 0.95 1. ]. This represent the cumulative distribution.

you can use a uniform distribution to draw an index in this unit range.

def binary_search(axis, q, s, e):
    if e-s <= 1:
        print(s)
        return s
    else: 
        m = int( np.around( (s+e)/2 ) )
        if q < axis[m]:
            binary_search(axis, q, s, m)
        else:
            binary_search(axis, q, m, e)



range_index = np.random.rand(1)
print(range_index)
q = range_index
s = 0
e = sample_axis.shape[0]-1
binary_search(sample_axis, q, 0, e)
Albert Chen
  • 1,331
  • 1
  • 12
  • 13
0

Also responded here: find random country but probability of picking higher population country should be higher. Using TreeMap:

TreeMap<Integer, Integer> map = new TreeMap<>();
map.put(percent1, 1);
map.put(percent1 + percent2, 2);
// ...

int random = (new Random()).nextInt(100);
int result = map.ceilingEntry(random).getValue();
RoberMP
  • 1,306
  • 11
  • 22
0

This may be useful to someone, a simple one I did in python. you just have to change the way p and r are written. This one, for instance, projects random values between 0 and 0.1 to 1e-20 to 1e-12.

import random

def generate_distributed_random():
    p = [1e-20, 1e-12, 1e-10, 1e-08, 1e-04, 1e-02, 1]
    r = [0, 0.1, 0.3, 0.5, 0.7, 0.9, 1]
    val = random.random()
    for i in range(1, len(r)):
        if val <= r[i] and val >= r[i - 1]:
            slope = (p[i] - p[i - 1])/(r[i] - r[i - 1])
            return p[i - 1] + (val - r[i - 1])*slope


print(generate_distributed_random())
besabestin
  • 482
  • 8
  • 26