3

I'm looking for an efficient Python function that randomly allocates an integer across k bins. That is, some function allocate(n, k) will produce a k-sized array of integers summing to n.

For example, allocate(3, 2) would produce [3, 0], [2, 1], [1, 2], or [0, 3] with equal probability (unlike Allocate an integer randomly across k bins, the allocations, not the items, should be uniformly distributed).

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
  • @PeterO. Thanks for the pointer, but no, I'm looking for integers, and Python code. – Max Ghenis Apr 16 '22 at 03:31
  • Nice answer. I still think this question is different since it specifically asks about integers and Python. A Python version of your answer would of course be welcome. – Max Ghenis Apr 16 '22 at 03:46
  • Such fancy math in that link. If it were me, I'd do a dumber approach: I'd generate a list of all possible answers that match the conditions, then randomly select a sequence of indices into that list. Done. (Its easy to see that the first bin gets n+1 possible values. All possible answers can be generated recursively, if more than 2 bins. I'll leave it to someone else to write the actual code.) – ToolmakerSteve Apr 16 '22 at 03:48

2 Answers2

5

Using the "stars and bars" approach, we can transform this into a question of picking k-1 positions for possible dividers from a list of n+k-1 possible positions. (Wikipedia proof)

from random import sample

def allocate(n,k):
    dividers = sample(range(1, n+k), k-1)
    dividers = sorted(dividers)
    dividers.insert(0, 0)
    dividers.append(n+k)
    return [dividers[i+1]-dividers[i]-1 for i in range(k)]
    
print(allocate(4,3))

There are ((n+k-1) choose (k-1)) possible allocations that fit your criteria, each corresponding to a specific way of choosing k places to put the dividers among n+k-1 spots for objects, and this is equally likely to result in each one of them.

(Note the subtle difference to the proposed-in-comments existing answer to a similar question: This question is asking for an ordered series of non-negative integers, while the proposed answer gives an ordered series of positive integers. The naive modification of selecting the spots with replacement instead of without replacement does allow the full set of non-negative integer distributions, but it does not leave each distribution equally likely. Consider allocate(4,3): the only way to get [0, 0, 4] is to roll (0, 0), but you can get [1, 2, 1] by rolling (1, 3) or (3, 1)).

  • 1
    FWIW, the (performance) downside of an approach like this is that the full cost (of computing an allocation) happens on *every* call. There's a memory-vs-performance alternative, where you generate a list of all the possible outcomes *once*, then each allocate is as cheap as possible: its simply a random generation of an index into the list. – ToolmakerSteve Apr 16 '22 at 03:58
  • 1
    I verified that this gives a uniform distribution of allocations: https://colab.research.google.com/drive/1kjL26nsK4n3XZitRw-0lcnlgxptCcLEx – Max Ghenis Apr 16 '22 at 04:04
  • 1
    @ToolmakerSteve The preferred solution depends on how many possible solutions there are. For allocate(4,3), if you need to do this over and over, then generating a list of outcomes is going to be most efficient. (n+k-1) choose (k-1) grows fast, though. For an extreme example, for something like allocate(1000000, 100), you've got about 10^438 possible allocations. – Richard Yannow Apr 16 '22 at 04:10
  • yep. Its a tradeoff. Note that it doesn't have to be a binary choice between "do all the work each time" vs "cache all the outcomes". For extreme cases, one can hold partial solutions to lower the time cost, yet stay within acceptable memory usage. Though that is more complex to code. Given that OP asked for an *efficient* solution, I wanted readers to consider the possibility. Regardless, the algorithm you show looks like a good place to start from. – ToolmakerSteve Apr 16 '22 at 06:19
0

I couldn't find a fast numpy solution without a massive (n x size x int32) space overhead. So here is a numba implementation of @RichardYannow's solution. Please remember to exclude the first call from benchmarks.

The sampling algorithm is a simplified adaptation of np.rng.choice, size=1 samples should be "np.squeezed" to remove the unused dimension.

import numpy as np
import numba as nb # tested with numba 0.55.1

@nb.njit
def allocate(n, k, size):
    idx = np.zeros((size, k+1), np.int32)
    s = np.empty(n+k, np.uint8)
    for j in range(size):
        s.fill(0)
        for i in range(n+1, n + k):
            x = np.random.randint(i) + 1
            if s[x]: x = i
            s[x] = 1
            idx[j, i - n] = x

        idx[j, 1:k].sort()
        idx[j, k] = n+k
        for i in range(k):
            idx[j, i] = idx[j, i+1] - idx[j, i] - 1
    return idx[:, :k]

allocate(4, 3, size=5)

Output

array([[1, 1, 2],
       [0, 3, 1],
       [1, 1, 2],
       [1, 1, 2],
       [1, 2, 1]], dtype=int32)

Checking uniform distribution across all possible allocations

from collections import Counter
Counter(tuple(x) for x in allocate(4, 3, size=10000))

Output

Counter({(0, 0, 4): 685,
         (0, 1, 3): 680,
         (0, 2, 2): 646,
         (0, 3, 1): 666,
         (0, 4, 0): 662,
         (1, 0, 3): 660,
         (1, 1, 2): 670,
         (1, 2, 1): 661,
         (1, 3, 0): 647,
         (2, 0, 2): 650,
         (2, 1, 1): 699,
         (2, 2, 0): 682,
         (3, 0, 1): 667,
         (3, 1, 0): 669,
         (4, 0, 0): 656})
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32