4

I'm looking for an efficient Python function that randomly allocates an integer across k bins. That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.

For example, allocate(4, 3) could produce [4, 0, 0], [0, 2, 2], [1, 2, 1], etc.

It should be randomly distributed per item, assigning each of the n items randomly to each of the k bins.

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132

5 Answers5

2

This should be faster than your brute-force version when n >> k:

def allocate(n, k):
    result = np.zeros(k)
    sum_so_far = 0
    for ind in range(k-1):
        draw = np.random.randint(n - sum_so_far + 1)
        sum_so_far += draw
        result[ind] = draw
    result[k-1] = n - sum_so_far

    return result

The idea is to draw a random number up to some maximum m (which starts out equal to n), and then we subtract that number from the maximum for the next draw, and so on, thus guaranteeing that we will never exceed n. This way we fill up the first k-1 entries; the final one is filled with whatever is missing to get a sum of exactly n.

Note: I am not sure whether this results in a "fair" random distribution of values or if it is somehow biased towards putting larger values into earlier indices or something like that.

xdurch0
  • 9,905
  • 4
  • 32
  • 38
2

If you are looking for a uniform distribution across all possible allocations (which is different from randomly distributing each item individually):

Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)

from random import sample

def allocate(n,k):
    dividers = sample(range(1, n+k), k-1)
    dividers = sorted(dividers)
    dividers.insert(0, 0)
    dividers.append(n+k)
    return [dividers[i+1]-dividers[i]-1 for i in range(k)]
    
print(allocate(4,3))

There are ((n+k-1) choose (k-1)) possible distributions, and this is equally likely to result in each one of them.

(This is a modification of Wave Man's solution: that one is not uniform across all possible solutions: note that the only way to get [0,0,4] is to roll (0,0), but there are two ways to get [1,2,1]; rolling (1,3) or (3,1). Choosing from n+k-1 slots and counting dividers as taking a slot corrects for this. In this solution, the random sample (1,2) corresponds to [0,0,4], and the equally likely random sample (2,5) corresponds to [1,2,1])

  • Thanks, I added a new question for this variant, hope you'll submit this there! https://stackoverflow.com/q/71890816/1840471 – Max Ghenis Apr 16 '22 at 03:18
1

Here's a brute-force approach:

import numpy as np

def allocate(n, k):
    res = np.zeros(k)
    for i in range(n):
        res[np.random.randint(k)] += 1
    return res

Example:

for i in range(3):
    print(allocate(4, 3))

[0. 3. 1.]
[2. 1. 1.]
[2. 0. 2.]

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
1

Adapting Michael Szczesny's comment based on numpy's new paradigm:

def allocate(n, k):
    return np.random.default_rng().multinomial(n, [1 / k] * k)

This notebook verifies that it returns the same distribution as my brute-force approach.

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
0

Here's my solution. I think it will make all possible allocations equally likely, but I don't have a proof of that.

from random import randint

def allocate(n,k):
    dividers = [randint(0,n) for i in range(k+1)]
    dividers[0] = 0
    dividers[k] = n
    dividers = sorted(dividers)    
    return [dividers[i+1]-dividers[i] for i in range(k)]
    
print(allocate(10000,100))
Wave Man
  • 33
  • 3