I want to generate a list of random distribution of numbers so their sum would be equal to a randomly chosen number. For example, if randomly chosen number is 5, the distribution would be [1 2 2] or [2 3] or [1 1 1 2] and so on. Any suggestions are welcome!
-
1Sounds like the subset sum problem https://en.wikipedia.org/wiki/Subset_sum_problem if you want all, if you want one then it is pretty trivial – Padraic Cunningham Aug 04 '15 at 14:00
4 Answers
Let n
be the number you want values to add up to. Generate a random sample
of random size (less than n
), consisting of values in the range 1 to n
exclusive of n
. Now add the endpoints 0 and n
, and sort. Successive differences of the sorted values will sum to n
.
import random as r
def random_sum_to(n):
a = r.sample(range(1, n), r.randint(1, n-1)) + [0, n]
list.sort(a)
return [a[i+1] - a[i] for i in range(len(a) - 1)]
print(random_sum_to(20)) # yields, e.g., [4, 1, 1, 2, 4, 2, 2, 4]
If you'd like to be able to specify the number of terms in the sum explicitly, or have it be random if unspecified, add an optional argument:
import random as r
def random_sum_to(n, num_terms = None):
num_terms = (num_terms or r.randint(2, n)) - 1
a = r.sample(range(1, n), num_terms) + [0, n]
list.sort(a)
return [a[i+1] - a[i] for i in range(len(a) - 1)]
print(random_sum_to(20, 3)) # [9, 7, 4] for example
print(random_sum_to(5)) # [1, 1, 2, 1] for example

- 11,217
- 6
- 43
- 49

- 18,696
- 4
- 27
- 56
-
This produces a more even distribution of list lengths than CoryKramer's answer does. – krupan Sep 14 '17 at 16:44
In a loop, you could keep drawing a random number between 1 and the remaining sum until you've reached your total
from random import randint
def generate_values(n):
values = []
while n > 0:
value = randint(1, n)
values.append(value)
n -= value
return values
A few samples of such a function
>>> generate_values(20)
[17, 1, 1, 1]
>>> generate_values(20)
[10, 4, 4, 1, 1]
>>> generate_values(20)
[14, 4, 1, 1]
>>> generate_values(20)
[5, 2, 4, 1, 5, 1, 1, 1]
>>> generate_values(20)
[2, 13, 5]
>>> generate_values(20)
[14, 3, 2, 1]

- 114,268
- 16
- 167
- 218
-
1since the sampling is done on different uniform distribution each time, the total distribution of sample is not uniform anymore right ? – toine Aug 04 '15 at 14:05
-
Correct, this would be biased towards smaller numbers as they would be included in more of the distributions. The larger numbers (closest to the original value) would be excluded soon after the first few numbers. This was just something I came up with off the top of my head. – Cory Kramer Aug 04 '15 at 15:31
Consider doing it continuously first. And for a moment we do not care about final number, so let's sample uniformly X_i in the interval [0...1] so that their sum is equal to 1
X_1 + X_2 + ... X_n = 1
This is well-known distribution called Dirichlet Distribution, or gamma variate, or simplex sampling. See details and discussion at Generating N uniform random numbers that sum to M. One can use random.gammavariate(a,1)
or for correct handling of corners gamma variate with parameter 1 is equivalent exponential distribution, with direct sampling code below
def simplex_sampling(n):
r = []
sum = 0.0
for k in range(0,n):
x = random.random()
if x == 0.0:
return (1.0, make_corner_sample(n, k))
t = -math.log(x)
r.append(t)
sum += t
return (sum, r)
def make_corner_sample(n, k):
r = []
for i in range(0, n):
if i == k:
r.append(1.0)
else:
r.append(0.0)
return r
So from simplex_sampling
you have vector and the sum to be used as normalization.
Thus, to use it for, say, N=5
N = 5
sum, r = simplex_sampling(N)
norm = float(N)/sum
# normalization together with matching back to integers
result = []
for k in range(N):
# t is now float uniformly distributed in [0.0...N], with sum equal to N
t = r[k] * norm
# not sure if you could have zeros,
# and check for boundaries might be useful, but
# conversion to integers is trivial anyway:
# values in [0...1) shall be converted to 0,
# values in [1...2) shall be converted to 1, etc
result.append( int(t) )

- 1
- 1

- 18,636
- 3
- 38
- 64
here's a simple way to have, a random list of probabilities, where there sum is equal to one;
a = np.random.random(2)
a /=sum(a)
a

- 1,329
- 2
- 18
- 33