Allocate an integer randomly across k bins

Question

I'm looking for an efficient Python function that randomly allocates an integer across k bins. That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.

For example, allocate(4, 3) could produce [4, 0, 0], [0, 2, 2], [1, 2, 1], etc.

It should be randomly distributed per item, assigning each of the n items randomly to each of the k bins.

@MichaelSzczesny thanks, this is the best solution! I'll accept it if you want to enter it as an answer. — Max Ghenis, Apr 15 '22 at 22:13
@MichaelSzczesny I entered this as a solution here: https://stackoverflow.com/a/71891419/1840471 — Max Ghenis, Apr 16 '22 at 05:43

score 2 · Answer 1 · answered Apr 15 '22 at 20:37

This should be faster than your brute-force version when n >> k:

def allocate(n, k):
    result = np.zeros(k)
    sum_so_far = 0
    for ind in range(k-1):
        draw = np.random.randint(n - sum_so_far + 1)
        sum_so_far += draw
        result[ind] = draw
    result[k-1] = n - sum_so_far

    return result

The idea is to draw a random number up to some maximum m (which starts out equal to n), and then we subtract that number from the maximum for the next draw, and so on, thus guaranteeing that we will never exceed n. This way we fill up the first k-1 entries; the final one is filled with whatever is missing to get a sum of exactly n.

Note: I am not sure whether this results in a "fair" random distribution of values or if it is somehow biased towards putting larger values into earlier indices or something like that.

Thanks, interesting approach. I'd like to see its fairness before using though. — Max Ghenis, Apr 15 '22 at 21:27
`Counter(tuple(allocate(4, 3)) for _ in range(1000))` to see that `(4,0,0)` or `(3,1,0)` occurs far too often. — Michael Szczesny, Apr 15 '22 at 21:53

score 2 · Answer 2 · answered Apr 15 '22 at 22:50

If you are looking for a uniform distribution across all possible allocations (which is different from randomly distributing each item individually):

Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)

from random import sample

def allocate(n,k):
    dividers = sample(range(1, n+k), k-1)
    dividers = sorted(dividers)
    dividers.insert(0, 0)
    dividers.append(n+k)
    return [dividers[i+1]-dividers[i]-1 for i in range(k)]
    
print(allocate(4,3))

There are ((n+k-1) choose (k-1)) possible distributions, and this is equally likely to result in each one of them.

(This is a modification of Wave Man's solution: that one is not uniform across all possible solutions: note that the only way to get [0,0,4] is to roll (0,0), but there are two ways to get [1,2,1]; rolling (1,3) or (3,1). Choosing from n+k-1 slots and counting dividers as taking a slot corrects for this. In this solution, the random sample (1,2) corresponds to [0,0,4], and the equally likely random sample (2,5) corresponds to [1,2,1])

Thanks, I added a new question for this variant, hope you'll submit this there! https://stackoverflow.com/q/71890816/1840471 — Max Ghenis, Apr 16 '22 at 03:18

score 1 · Answer 3 · answered Apr 15 '22 at 20:24

Here's a brute-force approach:

import numpy as np

def allocate(n, k):
    res = np.zeros(k)
    for i in range(n):
        res[np.random.randint(k)] += 1
    return res

Example:

for i in range(3):
    print(allocate(4, 3))

[0. 3. 1.]
[2. 1. 1.]
[2. 0. 2.]

score 1 · Accepted Answer · answered Apr 16 '22 at 05:42

1

Adapting Michael Szczesny's comment based on numpy's new paradigm:

def allocate(n, k):
    return np.random.default_rng().multinomial(n, [1 / k] * k)

This notebook verifies that it returns the same distribution as my brute-force approach.

answered Apr 16 '22 at 05:42

Max Ghenis

14,783
16
84
132

score 0 · Answer 5 · answered Apr 15 '22 at 22:01

Here's my solution. I think it will make all possible allocations equally likely, but I don't have a proof of that.

from random import randint

def allocate(n,k):
    dividers = [randint(0,n) for i in range(k+1)]
    dividers[0] = 0
    dividers[k] = n
    dividers = sorted(dividers)    
    return [dividers[i+1]-dividers[i] for i in range(k)]
    
print(allocate(10000,100))

Allocate an integer randomly across k bins

5 Answers5

Linked