18

Given an array of size n I want to generate random probabilities for each index such that Sigma(a[0]..a[n-1])=1

One possible result might be:

0     1     2     3     4
0.15  0.2   0.18  0.22  0.25

Another perfectly legal result can be:

0     1     2     3     4
0.01  0.01  0.96  0.01  0.01

How can I generate these easily and quickly? Answers in any language are fine, Java preferred.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • By sigma you mean standard deviation? I hope you realize that as soon as you say standard deviation you automatically imply you are drawing your random numbers from the normal distribution. – ldog Jan 31 '10 at 10:17
  • Most computer RNG's draw numbers from the uniform distribution. – ldog Jan 31 '10 at 10:18
  • You can side step this issue by realizing that the Central Limit theorem can help: http://en.wikipedia.org/wiki/Central_limit_theorem – ldog Jan 31 '10 at 10:18
  • i.e. what you want to do to draw 1 number from your normal distribution is to sample N numbers from the RNG uniform distribution and take their average to produce one random "normal" value, where N is large enough to suite your needs (preferably N > 1 which is what you are doing now.) – ldog Jan 31 '10 at 10:20
  • for doing some more serious statistical tasks, you might want to look into the R language: http://www.r-project.org/ – Carl Jan 31 '10 at 19:13
  • 1
    @gmatt: No, he means "sum," look at his examples – BlueRaja - Danny Pflughoeft Feb 01 '10 at 18:48
  • Since n is an array the for loop should be: for (int i = 0; i < n.length; i++) –  Feb 29 '12 at 17:39

6 Answers6

22

Get n random numbers, calculate their sum and normalize the sum to 1 by dividing each number with the sum.

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Nice :) didn't think of that... תודה! – Yuval Adam Jan 31 '10 at 09:04
  • 13
    This introduces bias. You cannot sample uniformly from a simplex this way. – dreeves Jun 09 '10 at 18:54
  • @dreeves - can you elaborate? – Yuval Adam Jun 09 '10 at 19:36
  • 4
    Figure 2 of this paper -- http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf -- gives a visual depiction of the bias in the n=3 case of the normalization method you propose. I believe that kohomologie's answer is correct (though I didn't check the code they included). – dreeves Jun 09 '10 at 23:30
18

The task you are trying to accomplish is tantamount to drawing a random point from the N-dimensional unit simplex.

http://en.wikipedia.org/wiki/Simplex#Random_sampling might help you.

A naive solution might go as following:

public static double[] getArray(int n)
    {
        double a[] = new double[n];
        double s = 0.0d;
        Random random = new Random();
        for (int i = 0; i < n; i++)
        {
           a [i] = 1.0d - random.nextDouble();
           a [i] = -1 * Math.log(a[i]);
           s += a[i];
        }
        for (int i = 0; i < n; i++)
        {
           a [i] /= s;
        }
        return a;
    }

To draw a point uniformly from the N-dimensional unit simplex, we must take a vector of exponentially distributed random variables, then normalize it by the sum of those variables. To get an exponentially distributed value, we take a negative log of uniformly distributed value.

Marat Salikhov
  • 6,367
  • 4
  • 32
  • 35
  • 3
    In most languages, you should create `Random` only once, or you will get not random results (and in many cases - the same number over and over). I'm also concerned about the use of `log` - can you please explain why it's there? – Kobi Jan 31 '10 at 09:41
  • +1 for good reference, but I think `nextDouble()` already adjusts for uniform distribution: http://java.sun.com/javase/6/docs/api/java/util/Random.html#nextDouble() – trashgod Jan 31 '10 at 19:29
  • Kobi, thanks for pointing out the `new Random()` thing. As for `log` -- I have edited my post to include more thorough explanation. – Marat Salikhov Jan 31 '10 at 19:50
  • 1
    trashgod, nextDouble() returns an uniformly distributed value from [0,1). So when we take n such values, we get a random point in a parallelotope. If we divide this point's coordinates by Manhattan distance between this point and the origin, I'm quite unsure that we'll get an uniformly distributed point in the unit simplex. – Marat Salikhov Jan 31 '10 at 19:51
  • 1
    thank you for clarifying; the algorithm presumes uniform distribution and uses -ln to get the required exponential distribution. – trashgod Feb 01 '10 at 03:56
  • the link to the wikipedia article does not exist anymore – Bogdan Lataianu Feb 21 '13 at 17:24
  • Also, you obviously mean taking a negative log – Bogdan Lataianu Feb 21 '13 at 17:53
2

This is relatively late, but to show the ammendment to @Kobi's simple and straightforward answer given in this paper pointed to by @dreeves which makes the sampling uniform. The method (if I understand it clearly) is to

  1. Generate n-1 distinct values from the range [1, 2, ... , M-1].
  2. Sort the resulting vector
  3. Add 0 and M as the first and last elements of the resulting vector.
  4. Generate a new vector by computing xi - xi-1 where i = 1,2, ... n. That is, the new vector is made up of the differences between consecutive elements of the old vector.
  5. Divide each element of the new vector by M. You have your uniform distribution!

I am curious to know if generating distinct random values and normalizing them to 1 by dividing by their sum will also produce a uniform distribution.

ayePete
  • 411
  • 1
  • 7
  • 23
1

Get n random numbers, calculate their sum and normalize the sum to 1 by dividing each number with the sum.

Expanding on Kobi's answer, here's a Java function that does exactly that.

public static double[] getRandDistArray(int n)  {
    double randArray[] = new double[n];
    double sum = 0;

    // Generate n random numbers
    for (int i = 0; i < randArray.length; i++) {
        randArray[i] = Math.random();
        sum += randArray[i];
    }

    // Normalize sum to 1
    for (int i = 0; i < randArray.length; i++) {
        randArray[i] /= sum;
    }
    return randArray;
}

In a test run, getRandDistArray(5) returned the following

[0.1796505603694718, 0.31518724882558813, 0.15226147256596428, 0.30954417535503603, 0.043356542883939767]
Community
  • 1
  • 1
Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
0

If you want to generate values from a normal distribution efficiently, try the Box Muller Transformation.

James McLeod
  • 2,381
  • 1
  • 17
  • 19
0
public static double[] array(int n){

    double[] a = new double[n];
    double flag = 0;

    for(int i=0;i<n;i++){
        a[i] = Math.random();
        flag += a[i];
    }

    for(int i=0;i<n;i++) a[i] /= flag;

    return a;
}

Here, at first a stores random numbers. And the flag will keep the sum all the numbers generated so that at the next for loop the numbers generated will be divided by the flag, which at the end the array will have random numbers in probability distribution.