122
  • This question is not a duplicate of Getting N random numbers whose sum is M because:
    1. Most answers there are about theory, not a specific coding solution in python to answer this question
    2. The accepted answer here is 5 years older than the one answer in the duplicate that answers this question.
    3. The duplicate accepted answer does not answer this question

How would I make a list of N (say 100) random numbers, so that their sum is 1?

I can make a list of random numbers with

r = [ran.random() for i in range(1,100)]

How would I modify this so that the list sums to 1 (this is for a probability simulation).

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Tom Kealy
  • 2,537
  • 2
  • 27
  • 45

12 Answers12

222

The simplest solution is indeed to take N random values and divide by the sum.

A more generic solution is to use the Dirichlet distribution which is available in numpy.

By changing the parameters of the distribution you can change the "randomness" of individual numbers

>>> import numpy as np, numpy.random
>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136    0.03061161  0.09046587
   0.19987289  0.13398581  0.03119906  0.17598322]]

>>> print np.random.dirichlet(np.ones(10)/1000.,size=1)
[[  2.63435230e-115   4.31961290e-209   1.41369771e-212   1.42417285e-188
    0.00000000e+000   5.79841280e-143   0.00000000e+000   9.85329725e-005
    9.99901467e-001   8.37460207e-246]]

>>> print np.random.dirichlet(np.ones(10)*1000.,size=1)
[[ 0.09967689  0.10151585  0.10077575  0.09875282  0.09935606  0.10093678
   0.09517132  0.09891358  0.10206595  0.10283501]]

Depending on the main parameter the Dirichlet distribution will either give vectors where all the values are close to 1./N where N is the length of the vector, or give vectors where most of the values of the vectors will be ~0 , and there will be a single 1, or give something in between those possibilities.

EDIT (5 years after the original answer): Another useful fact about the Dirichlet distribution is that you naturally get it, if you generate a Gamma-distributed set of random variables and then divide them by their sum.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
sega_sai
  • 8,328
  • 1
  • 29
  • 38
  • 14
    +1 for being the only one to mention the Dirichlet distribution. This should be the answer. – Timothy Shields Sep 06 '13 at 16:41
  • 4
    I've changed my accepted answer to this one, as scaling doesn't necessarily give a uniform distribution. – Tom Kealy Sep 08 '13 at 12:58
  • 2
    @Tom, I don't begrudge your choice, and this answer is nice, but I want to make something clear: Scaling _does_ necessarily give a uniform distribution (over `[0,1/s)`). It will be exactly as uniform as the unscaled distribution you started with, because scaling doesn't change the distribution, but just compresses it. This answer gives a variety of distributions, only one of which is uniform. If this doesn't make sense to you, run the examples and look at some histograms to make it clear. Also try the same thing with a gaussian distribution (`np.random.normal`). – askewchan Sep 09 '13 at 18:08
  • @askewchan, you are not correct here. taking random numbers and dividing by the sum will NOT give the uniform distribution(it will be close to uniform for very large N, but never strictly uniform and also not uniform at all at smaller N). The Dirichlet distribution will not give the uniform distributions either (because it is impossible to get uniform distributions and sum of 1). – sega_sai Sep 09 '13 at 19:26
  • @sega_sai In that vein, there is no strictly uniform distribution that can be pseudo-randomly generated. What I mean is that renormalizing a 'uniform' distribution doesn't make it any less uniform. I was responding to Tom's comment that implied this answer was selected because he wanted a uniform distribution. Unless I'm more fundamentally mistaken? – askewchan Sep 09 '13 at 19:27
  • @askewchan Renormalizing uniform distribution by the sum of numbers which include the original number makes the distribution non-uniform. (it is easy to see even with two numbers) – sega_sai Sep 09 '13 at 19:57
  • @sega_sai It's not easy to see for me. It changes the interval of the distribution but I don't see how it redistributes it to be non-uniform. – askewchan Sep 09 '13 at 20:02
  • @askewchan The simplest way is to figure out is to just plot the histogram from your example for N=2 -- you'll see that the distribution will look triangular like. – sega_sai Sep 09 '13 at 20:23
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/37061/discussion-between-askewchan-and-sega-sai) – askewchan Sep 09 '13 at 20:24
  • @askewchan If you guys came to any conclusion I'd be really interested to know. – Tom Kealy Sep 10 '13 at 11:34
  • @Tom, sorry, I'm still confused, and sega_sai never came to my chat :( – askewchan Sep 10 '13 at 13:30
  • @askewchan Sorry, didn't have time for that. – sega_sai Sep 10 '13 at 16:07
  • I think "...to take N random values..." should be "...to take N exponentially distributed random values...", for example, with `random.expovariate`. – Peter O. Jul 28 '17 at 21:33
  • Thanks, this seems to be what I was looking for. But maybe not quite... Is there a way to get a pdf that produces samples that tend towards a certain mean distribution (where each element is not 1/N) and then adjust the amount of variation from that? (while still summing to 1). – Bill Apr 04 '20 at 17:46
  • I think the non-scientific answer to my question above is to scale the alpha values. I.e. `v*alpha` where high `v` results in low variance to the mean sample distribution (`alpha/sum(alpha)`) and low `v` means high variance. – Bill Apr 04 '20 at 18:23
  • @sega_sai hey im just learnings, and want to thank you for doing this. i have a question: if i use: ```a = np.array([400, 400, 400,400])``` ```d = np.random.dirichlet(a,1000)``` and then i plot it as a histogram with ```plt.hist(d)``` it really fits into a normal distribution. is that because of how the generator works, or how this distribution is set up? thanks – avnav99 Dec 05 '21 at 22:23
  • @sega_sai it looks very similar to this: ```norm = randoms.normal(0.25,0.01,(400*4,4))``` – avnav99 Dec 05 '21 at 22:55
44

The best way to do this is to simply make a list of as many numbers as you wish, then divide them all by the sum. They are totally random this way.

r = [ran.random() for i in range(1,100)]
s = sum(r)
r = [ i/s for i in r ]

or, as suggested by @TomKealy, keep the sum and creation in one loop:

rs = []
s = 0
for i in range(100):
    r = ran.random()
    s += r
    rs.append(r)

For the fastest performance, use numpy:

import numpy as np
a = np.random.random(100)
a /= a.sum()

And you can give the random numbers any distribution you want, for a probability distribution:

a = np.random.normal(size=100)
a /= a.sum()

---- Timing ----

In [52]: %%timeit
    ...: r = [ran.random() for i in range(1,100)]
    ...: s = sum(r)
    ...: r = [ i/s for i in r ]
   ....: 
1000 loops, best of 3: 231 µs per loop

In [53]: %%timeit
   ....: rs = []
   ....: s = 0
   ....: for i in range(100):
   ....:     r = ran.random()
   ....:     s += r
   ....:     rs.append(r)
   ....: 
10000 loops, best of 3: 39.9 µs per loop

In [54]: %%timeit
   ....: a = np.random.random(100)
   ....: a /= a.sum()
   ....: 
10000 loops, best of 3: 21.8 µs per loop
askewchan
  • 45,161
  • 17
  • 118
  • 134
  • 2
    @Tom No worries, it's easy to get stuck trying to make these things a lot harder than they are :) Now it's here for the next person. – askewchan Sep 06 '13 at 14:22
  • 3
    I think it's time for beer. – Tom Kealy Sep 06 '13 at 14:23
  • 1
    This is a good solution, but it seems like there should be a way to do this in a single pass that gets a good distribution across the range. Create, sum, modify is a 3 pass operation. You could at least optimize away one pass by summing as you generate though. – Silas Ray Sep 06 '13 at 14:25
  • @Silas True, you could do one loop and keep a running sum. I think the numpy solution would be faster though, and if one is doing probability distribution work, it's likely that the dependency would be useful elsewhere. – askewchan Sep 06 '13 at 14:26
  • @Silas I'm no expert, but probably not. Mainly it's faster since the array operations are faster than python loops. – askewchan Sep 06 '13 at 14:28
  • 2
    Scaling is not necessarily good. See my answer for more. There are many possible mappings from [0,1)^n onto the target space (sum of x_i = 1) and they can't all be uniform! – Mike Housky Sep 06 '13 at 14:45
  • @Mike, agreed. The outcome of my answer has a uniform distribution on `[0,1/N)`. But at first glance, I don't believe this is an issue for more natural distributions without a hard upper limit, for example the normal distribution, which can be renormalized without hard boundaries. – askewchan Sep 06 '13 at 14:49
  • 1
    **This is wrong**, at least in case you care about actual uniform distributions https://stackoverflow.com/a/8068956/2075003 – n1000 Jul 20 '18 at 16:57
  • Thank you for that link, @n1000! That answer is the first explanation that clarified this issue to me! – askewchan Jul 23 '18 at 19:21
8

Dividing each number by the total may not give you the distribution you want. For example, with two numbers, the pair x,y = random.random(), random.random() picks a point uniformly on the square 0<=x<1, 0<=y<1. Dividing by the sum "projects" that point (x,y) onto the line x+y=1 along the line from (x,y) to the origin. Points near (0.5,0.5) will be much more likely than points near (0.1,0.9).

For two variables, then, x = random.random(), y=1-x gives a uniform distribution along the geometrical line segment.

With 3 variables, you are picking a random point in a cube and projecting (radially, through the origin), but points near the center of the triangle will be more likely than points near the vertices. The resulting points are on a triangle in the x+y+z plane. If you need unbiased choice of points in that triangle, scaling is no good.

The problem gets complicated in n-dimensions, but you can get a low-precision (but high accuracy, for all you laboratory science fans!) estimate by picking uniformly from the set of all n-tuples of non-negative integers adding up to N, and then dividing each of them by N.

I recently came up with an algorithm to do that for modest-sized n, N. It should work for n=100 and N = 1,000,000 to give you 6-digit randoms. See my answer at:

Create constrained random numbers?

Community
  • 1
  • 1
Mike Housky
  • 3,959
  • 1
  • 17
  • 31
  • You should check out the [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution#Probability_density_function). – Jonathan H Sep 12 '18 at 10:06
6

Create a list consisting of 0 and 1, then add 99 random numbers. Sort the list. Successive differences will be the lengths of intervals that add up to 1.

I'm not fluent in Python, so forgive me if there's a more Pythonic way of doing this. I hope the intent is clear though:

import random

values = [0.0, 1.0]
for i in range(99):
    values.append(random.random())
values.sort()
results = []
for i in range(1,101):
    results.append(values[i] - values[i-1])
print results

Here's an updated implementation in Python 3:

import random

def sum_to_one(n):
    values = [0.0, 1.0] + [random.random() for _ in range(n - 1)]
    values.sort()
    return [values[i+1] - values[i] for i in range(n)]

print(sum_to_one(100))
pjs
  • 18,696
  • 4
  • 27
  • 56
5

In addition to @pjs's solution we can define a function with two parameters as well.

import numpy as np

def sum_to_x(n, x):
    values = [0.0, x] + list(np.random.uniform(low=0.0,high=x,size=n-1))
    values.sort()
    return [values[i+1] - values[i] for i in range(n)]

sum_to_x(10, 0.6)
Out: 
[0.079058655684546,
 0.04168649034779022,
 0.09897491411670578,
 0.065152293196646,
 0.000544800901222664,
 0.12329662037166766,
 0.09562168167787738,
 0.01641359261155284,
 0.058273232428072474,
 0.020977718663918954]  
Caner Erden
  • 556
  • 7
  • 7
2

In case you want to have a minimum threshold for the randomly chosen numbers (i.e., the generated numbers should be atleast min_thresh),

rand_prop = 1 - num_of_values * min_thresh
random_numbers = (np.random.dirichlet(np.ones(10),size=1)[0] * rand_prop) + min_thresh

Just make sure that you have num_of_values (number of values to be generated) such that it is possible for generating required numbers (num_values <= 1/min_thesh)

So basically, we are fixing some portion of 1 for minimum threshold, then we create random numbers in other portion. We add min_thesh to all numbers to get sum 1. For e.g: lets say you want to generate 3 numbers, with min_thresh=0.2. We create a portion to fill by random numbers [1 - (0.2x3) = 0.4]. We fill that portion and add 0.2 to all values, so we can get 0.6 filled too.

This is standard scaling and shifting used in random numbers generation theory. Credit goes to my friend Jeel Vaishnav (I am not sure if has SO profile) and @sega_sai.

Parthesh Soni
  • 133
  • 1
  • 6
2

Inspired by @sega_sai answer with an up-to-date and recommanded numpy implementation [March 2022]

from numpy.random import default_rng

rng = default_rng()
rng.dirichlet(np.ones(10),size=1)
>>> array([[0.01279836, 0.16891858, 0.01136867, 0.17577222, 0.27944229,
        0.06244618, 0.19878224, 0.02481954, 0.01478089, 0.05087103]])

References:

Antiez
  • 679
  • 7
  • 11
1

generate 100 random numbers doesn't matter what range. sum the numbers generated, divide each individual by the total.

guessing
  • 11
  • 1
1

An alternative solution would be using random.choice and divide by sum:

import random 
n = 5
rand_num = [random.choice(range(0,100)) for r in range(n)] # create random integers
rand_num = [i/sum(rand_num) for i in rand_num] # normalize them
Sam S.
  • 627
  • 1
  • 7
  • 23
  • Unfortunatley this makes the resulting random numbers linearly dependent: if you know $n-1$ of them then you know the $n$th as well. So the sequence isn't trully random. It's why I accpeted the Dirchlet distribution answer. – Tom Kealy Oct 15 '21 at 07:57
  • Thanks Tom Kealy for your comments on my answer. I will definitely check its linear dependencies, but also believe it should be an alternative solution for some applications, in particular, it might be useful when you want to have control over your numbers by simply adjusting the range(0,100). I agree to your point on Dirichlet random numbers, and it is definitely one of the best solutions. – Sam S. Oct 17 '21 at 09:43
  • 1
    You can control the resulting distribution by adjusting the $\alpha$ parameter of the Diriclet distribution. I really can't emphasise enough that it is the preffered solution - that's why I made it the accepted answer. – Tom Kealy Oct 18 '21 at 12:41
0

You could easily do with:

r.append(1 - sum(r))
Paul Evans
  • 27,315
  • 3
  • 37
  • 54
0

In the spirit of "divide each element in list by sum of list", this definition will create a list of random numbers of length = PARTS, sum = TOTAL, with each element rounded to PLACES (or None):

import random
import time

PARTS       = 5
TOTAL       = 10
PLACES      = 3

def random_sum_split(parts, total, places):

    a = []
    for n in range(parts):
        a.append(random.random())
    b = sum(a)
    c = [x/b for x in a]    
    d = sum(c)
    e = c
    if places != None:
        e = [round(x*total, places) for x in c]
    f = e[-(parts-1):]
    g = total - sum(f)
    if places != None:
        g = round(g, places)
    f.insert(0, g)

    log(a)
    log(b)
    log(c)
    log(d)
    log(e)
    log(f)
    log(g)

    return f   

def tick():

    if info.tick == 1:

        start = time.time()

        alpha = random_sum_split(PARTS, TOTAL, PLACES)

        log('********************')
        log('***** RESULTS ******')
        log('alpha: %s' % alpha)
        log('total: %.7f' % sum(alpha))
        log('parts: %s' % PARTS)
        log('places: %s' % PLACES)

        end = time.time()  

        log('elapsed: %.7f' % (end-start))

result:

Waiting...
Saved successfully.
[2014-06-13 00:01:00] [0.33561018369775897, 0.4904215932650632, 0.20264927800402832, 0.118862130636748, 0.03107818050878819]
[2014-06-13 00:01:00] 1.17862136611
[2014-06-13 00:01:00] [0.28474809073311597, 0.41609766067850096, 0.17193755673414868, 0.10084844382959707, 0.02636824802463724]
[2014-06-13 00:01:00] 1.0
[2014-06-13 00:01:00] [2.847, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] [2.848, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] 2.848
[2014-06-13 00:01:00] ********************
[2014-06-13 00:01:00] ***** RESULTS ******
[2014-06-13 00:01:00] alpha: [2.848, 4.161, 1.719, 1.008, 0.264]
[2014-06-13 00:01:00] total: 10.0000000
[2014-06-13 00:01:00] parts: 5
[2014-06-13 00:01:00] places: 3
[2014-06-13 00:01:00] elapsed: 0.0054131
litepresence
  • 3,109
  • 1
  • 27
  • 35
0

In the spirit of pjs's method:

a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]

If you want them rounded to decimal places:

if places == None:
    return b
else:    
    b.pop()
    c = [round(x, places) for x in b]  
    c.append(round(total-sum(c), places))
    return c
litepresence
  • 3,109
  • 1
  • 27
  • 35