50

[I'm splitting a population number into different matrices and want to test my code using random numbers for now.]

Quick question guys and thanks for your help in advance -

If I use;

 100*rand(9,1)

What is the best way to make these 9 numbers add to 100?

I'd like 9 random numbers between 0 and 100 that add up to 100.

Is there an inbuilt command that does this because I can't seem to find it.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
Tetra
  • 747
  • 2
  • 9
  • 15
  • possible duplicate of [Non biased return a list of n random positive numbers (>=0) so that their sum == total_sum](http://stackoverflow.com/questions/3959021/non-biased-return-a-list-of-n-random-positive-numbers-0-so-that-their-sum) – Jason S Nov 09 '11 at 21:30
  • 1
    Good question, but a duplicate: the language-independent aspect of this is far more significant than the language-dependent part. – Jason S Nov 09 '11 at 21:30
  • Does this answer your question? [Getting N random numbers whose sum is M](https://stackoverflow.com/questions/2640053/getting-n-random-numbers-whose-sum-is-m) – Mahozad Aug 24 '21 at 16:42

4 Answers4

96

I see the mistake so often, the suggestion that to generate random numbers with a given sum, one just uses a uniform random set, and just scale them. But is the result truly uniformly random if you do it that way?

Try this simple test in two dimensions. Generate a huge random sample, then scale them to sum to 1. I'll use bsxfun to do the scaling.

xy = rand(10000000,2);
xy = bsxfun(@times,xy,1./sum(xy,2));
hist(xy(:,1),100)

If they were truly uniformly random, then the x coordinate would be uniform, as would the y coordinate. Any value would be equally likely to happen. In effect, for two points to sum to 1 they must lie along the line that connects the two points (0,1), (1,0) in the (x,y) plane. For the points to be uniform, any point along that line must be equally likely.

xy histogram

Clearly uniformity fails when I use the scaling solution. Any point on that line is NOT equally likely. We can see the same thing happening in 3-dimensions. See that in the 3-d figure here, the points in the center of the triangular region are more densely packed. This is a reflection of non-uniformity.

xyz = rand(10000,3);
xyz = bsxfun(@times,xyz,1./sum(xyz,2));
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on

xyzplot

Again, the simple scaling solution fails. It simply does NOT produce truly uniform results over the domain of interest.

Can we do better? Well, yes. A simple solution in 2-d is to generate a single random number that designates the distance along the line connecting the points (0,1) and 1,0).

t = rand(10000000,1);
xy = t*[0 1] + (1-t)*[1 0];
hist(xy(:,1),100)

Uniform x+y = 1

It can be shown that ANY point along the line defined by the equation x+y = 1, in the unit square, is now equally likely to have been chosen. This is reflected by the nice, flat histogram.

Does the sort trick suggested by David Schwartz work in n-dimensions? Clearly it does so in 2-d, and the figure below suggests that it does so in 3-dimensions. Without deep thought on the matter, I believe that it will work for this basic case in question, in n-dimensions.

n = 10000;
uv = [zeros(n,1),sort(rand(n,2),2),ones(n,1)];
xyz = diff(uv,[],2);

plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
box on
grid on
view(70,35)

Sort trick

One can also download the function randfixedsum from the file exchange, Roger Stafford's contribution. This is a more general solution to generate truly uniform random sets in the unit hyper-cube, with any given fixed sum. Thus, to generate random sets of points that lie in the unit 3-cube, subject to the constraint they sum to 1.25...

xyz = randfixedsum(3,10000,1.25,0,1)';
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'.')
view(70,35)
box on
grid on

randfixedsum

  • 1
    I know this is an old question, but can anyone explain this answer to me? How do you "generate a single random number that designates the distance along the line connecting the points (0,1) and 1,0). It can be shown that ANY point along the line defined by the equation x+y = 1, in the unit square, is now equally likely to have been chosen."? Is this even accurate? Part of my confusion is that the Matlab code he posted for proof just showed that the random numbers were uniform, however they weren't scaled to 1...which is the whole point. – jdfinch3 Nov 02 '16 at 23:10
  • 1
    @jdfinch3: the problem being solved is "select a pair of numbers x, y such that x+y = 1." The solution is: "select x in the range [0, 1] and then compute y = 1 - x". This ensures that the pair (x, y) is uniformly selected from the universe of all possible pairs. The histogram demonstrates that it works. David Schwartz's answer provides a generalisation for more than two dimensions (i.e. tuples with more than two elements). – rici May 19 '17 at 14:47
  • @rici - Thank you for your response! Your re-wording of the solution made it click for me. :) – jdfinch3 May 20 '17 at 15:18
  • It's fairly easy to prove for 3 dimensions: let the two random numbers be p and q, then if p < q then (p, q) lies within a triangle (0,0)-(0,1)-(1,1) and if p > q then (q, p) lies within that triangle. Subtracting the larger number from 1 maps the triangle to (0,1)-(0,0)-(1,0) which is the projection of the desired area on to the xy plane, and z is then simply the difference between p and q as desired. For higher dimensions I believe the fact that you can pack N N-D hyperpyramids in an N-D hypercube means that sorting the Nth random number is equivalent to choosing a suitable hyperpyramid. – Neil Jul 22 '18 at 10:46
75

One simple way is to pick 8 random numbers between 0 and 100. Add 0 and 100 to the list to give 10 numbers. Sort them. Then output the difference between each successive pair of numbers. For example, here's 8 random numbers between 0 and 100:

96, 38, 95, 5, 13, 57, 13, 20

So add 0 and 100 and sort.

0, 5, 13, 13, 20, 38, 57, 95, 96, 100

Now subtract:

5-0 = 5
13-5 = 8
13-13 = 0
20-13 = 7
38-20 = 18
57-38 = 19
95-57 = 38
96-95 = 1
100-96 = 4

And there you have it, nine numbers that sum to 100: 0, 1, 4, 5, 7, 8, 18, 19, 38. That I got a zero and a one was just a strange bit of luck.

rici
  • 234,347
  • 28
  • 237
  • 341
David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 4
    Awesome! I love it. I have adapted your method to create a series of random numbers that sum up to 0. This allows me to create random alterations with no *net* effect. (I.e. a moving game sprite appears to move randomly more or less, but never goes off-screen. ^^) – Timo Mar 19 '13 at 14:23
  • 8
    This only works when you are allowed to use any random number between 0 and your sum! For example this won't work when the range of random number is smaller than the desired sum. – Behrooz Mar 21 '15 at 18:33
  • 3
    The method is sound but this answer could use another edit ... Here's a list of 8 numbers ... add 0 and 100 (and silently drop 20 from the list to make a total of nine numbers). The answer concludes with "nine numbers that sum up 100:" and proceeds to list eight numbers. The general trend is take a list of N numbers, create a list of N+2 numbers by inserting 0 and 100, diff them down to a total of N+1 numbers and sort the result. – surgical_tubing Sep 14 '15 at 22:24
  • 1
    @surgical: I fixed the example, which I think was just a simple mistake. I think it is misleading to sort the final tuple, but I'll leave that for the original author to decide. – rici May 19 '17 at 14:50
  • python example: >>> a = [0] + sorted([random.random()*100 for _ in range(9-1)]) + [100] >>> b = [a[i+1] - a[i] for i in range(len(a)-1)] >>> sum(b) == 100 – Doody P Sep 10 '17 at 21:57
  • How would you handle this if some of the constraints are not constant but are a function of the other random variables, say the limiter of a2 is a1/2? (i.e. a2 must be at most half of a1 and assuming there is no loop in these constraints) – Veltzer Doron Dec 10 '20 at 13:17
6

It is not too late to give the right answer

Let's talk about sampling X1...XN in the range [0...1] such that Sum(X1, ..., XN) is equal to 1. Then you could rescale it to 100

This is called Dirichlet distribution, and below is the code to sample from it. Simplest case is when all parameters are equal to 1, then all marginal distributions for X1, ..., XN would be U(0,1). In general case, with parameters different from 1s, marginal distributions might have peaks.

----------------- taken from here ---------------------

The Dirichlet is a vector of unit-scale gamma random variables, normalized by their sum. So, with no error checking, this will get you that:

a = [1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0]; // 9 numbers to sample
n = 10000;
r = drchrnd(a,n)

function r = drchrnd(a,n)
  p = length(a);
  r = gamrnd(repmat(a,n,1),1,n,p);
  r = r ./ repmat(sum(r,2),1,p);
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
1

Take a list of N - 1 numbers, create a list of N + 1 numbers by inserting 0 and 100, sort the list, and diff them down to a total of N numbers.