possible distributions and their probabilities after putting identical items into anonymous buckets

Question

Apologies if the answer to this is readily found elsewhere. My math and stats are weak and thus I don't even know the search terms for what I'm trying to do . . .

I have b anonymous indistinguishable buckets into which I'm putting i identical items. I want to know all possible distributions and their probabilities. For example, if I have 3 buckets and 3 items, the answer I want is:

[3,0,0] -> 1/9
[2,1,0] -> 6/9
[1,1,1] -> 2/9

Notice that the buckets are anonymous and thus I want identical distributions to be combined as I have done above. For example, the [2,1,0] case is actually the sum of the [2,1,0], [0,2,1], etc. cases.

In addition, I have the constraint of a max bucket size. For example, 3 balls, 3 buckets, bucketsize=2 should return:

[2,1,0] prob=7/9
[1,1,1] prob=2/9

This can be seen looking at a probability tree:

Insert item 1 into [0,0,0] -> [1,0,0] p=1  
Insert item 2 into [1,0,0] -> [2,0,0] p=1/3 OR [1,1,0] 2/3  
Insert item 3 into [2,0,0] -> [2,1,0] p=1.0   
Insert item 3 into [1,1,0] -> [2,1,0] p=2/3 OR [1,1,1] p=1/3  

So state [2,1,0] has two paths to it: 1/3*1 AND 2/3*2/3 = 7/9  
So state [1,1,1] has one path  to it: 2/3 * 1/3 = 2/9

Here's another view of the probabilities for each: bins_and_balls http://bents.us/Pictures/bins_balls.png

Thanks very much for reading my question and for any help you can provide. I have not cross-posted this on https://stats.stackexchange.com/ but if people think it is better there, then I'll delete this one and repost it there.

UPDATE

There is been some discussion in the comments about the correctness of some of the proposed algorithms. To help validate, I wrote the following simulator:

#! /usr/bin/env python

from __future__ import division
import random

def simulate(num_bucks,items,bsize,iterations=50000):
  perms = dict()
  for n in range(iterations):
    buckets = [0] * num_bucks 
    for i in range(items):
      while True:
        b = random.randint(0,num_bucks-1)
        if buckets[b] < bsize:
          break  # kludge, loop until we find an unfilled bucket 
      buckets[b] +=1
    buckets.sort()
    buckets = tuple(reversed(buckets))
    try:
      perms[buckets]['count'] += 1
    except KeyError:
      perms[buckets] = {'perm' : buckets, 'count' : 1}

  for perm in perms.values(): 
    perm['prob'] = perm['count'] / iterations
  return perms


def main():
  perms =    simulate(num_bucks=3,items=3,bsize=2)
  for perm in perms.values():
    print perm

if __name__ == "__main__":
    main()

Which has output for 3 buckets, 3 balls, bucketsize of 2 like:

(1, 1, 1) 0.22394
(2, 1, 0) 0.77606

m69's been on strike for years · Accepted Answer · 2016-07-11T02:24:11.570

Overview

When distributing e.g. 3 items over 4 buckets, you can put the first item into any of the 4 buckets, then for each of these 4 possibilities you can put the second item into any of the 4 buckets, and then for each of these 16 possibilities you can put the third item into any of the 4 buckets, to make a total of 64 possibilities:

(3,0,0,0) (2,1,0,0) (2,0,1,0) (2,0,0,1) (2,1,0,0) (1,2,0,0) (1,1,1,0) (1,1,0,1) (2,0,1,0) (1,1,1,0) (1,0,2,0) (1,0,1,1) (2,0,0,1) (1,1,0,1) (1,0,1,1) (1,0,0,2) (2,1,0,0) (1,2,0,0) (1,1,1,0) (1,1,0,1) (1,2,0,0) (0,3,0,0) (0,2,1,0) (0,2,0,1) (1,1,1,0) (0,2,1,0) (0,1,2,0) (0,1,1,1) (1,1,0,1) (0,2,0,1) (0,1,1,1) (0,1,0,2) (2,0,1,0) (1,1,1,0) (1,0,2,0) (1,0,1,1) (1,1,1,0) (0,2,1,0) (0,1,2,0) (0,1,1,1) (1,0,2,0) (0,1,2,0) (0,0,3,0) (0,0,2,1) (1,0,1,1) (0,1,1,1) (0,0,2,1) (0,0,1,2) (2,0,0,1) (1,1,0,1) (1,0,1,1) (1,0,0,2) (1,1,0,1) (0,2,0,1) (0,1,1,1) (0,1,0,2) (1,0,1,1) (0,1,1,1) (0,0,2,1) (0,0,1,2) (1,0,0,2) (0,1,0,2) (0,0,1,2) (0,0,0,3)

You will notice that there are many duplicates in this list, e.g. (0,1,0,2) appears 3 times. That is because the 3 items are anonymous and identical, so there are several ways in which you can distribute the 3 items over the 4 buckets and arrive at the same result, e.g.:

(0,0,0,0) → (0,1,0,0) → (0,1,0,1) → (0,1,0,2)
(0,0,0,0) → (0,0,0,1) → (0,1,0,1) → (0,1,0,2)
(0,0,0,0) → (0,0,0,1) → (0,0,0,2) → (0,1,0,2)

Also, because the buckets are anonymous and identical, permutations like (2,0,1,0) and (0,1,2,0) are considered equal. Therefor, the 64 possibilities are grouped together to form these 3 classes:

(3,0,0,0) x4
(2,1,0,0) x36
(1,1,1,0) x24

The probability for each of these classes is then the number of possibilities that are part of the class, divided by the total number of possibilities:

(3,0,0,0) → 4 / 64 = 1 / 16
(2,1,0,0) → 36 / 64 = 9 / 16
(1,1,1,0) → 24 / 64 = 6 / 16

An algorithm which generates all the possibilities and then groups them together into classes is not very efficient, because the number of possibilities quickly becomes huge for higher numbers of items and buckets, as you have experienced.
It is better to generate the classes (step 1 below), calculate the number of permutations per class (step 2 below), calculate the number of duplicate distributions per permutation (step 3 below), and then combine the results to get the probability for each class (step 4 below).

Step 1: generating the partitions

First, generate the partitions (or more precisely: partitions with restricted number of parts) of i with b parts:

(3,0,0,0) (2,1,0,0) (1,1,1,0)

The basic way to generate partitions is to use a recursive algorithm, where you choose all possible values for the first part, and then recurse with the rest, e.g.:

partition(4) :
&bullet; 4
&bullet; 3 + partition(1)
&bullet; 2 + partition(2)
&bullet; 1 + partition(3)
→ (4) (3,1) (2,2) (2,1,1) (1,3) (1,2,1) (1,1,2) (1,1,1,1)

In order to avoid duplicates, only generate non-increasing sequences of values. To achieve this, you have to tell every recursion what the maximum value of its parts is, e.g.:

partition(4, max=4) :
&bullet; 4
&bullet; 3 + partition(1, max=3)
&bullet; 2 + partition(2, max=2)
&bullet; 1 + partition(3, max=1)
→ (4) (3,1) (2,2) (2,1,1) (1,1,1,1)

When partitioning with restricted number of parts, the first part should not be less than the total value divided by the number of parts, e.g. when partitioning 4 into 3 parts, there's no partition with first part 1, because 1 < 4/3:

partition(4, parts=3, max=4) :
&bullet; 4,0,0
&bullet; 3 + partition(1, parts=2, max=3)
&bullet; 2 + partition(2, parts=2, max=2)
→ (4,0,0) (3,1,0) (2,2,0) (2,1,1)

When the number of parts is 2, this is the lowest recursion level. The partitions are then generated by iterating over the values for the first part, from the maximum down to half of the value, and putting the rest in the second part, e.g.:

partition(4, parts=2, max=3) :
&bullet; 3,1
&bullet; 2,2
→ (3,1) (2,2)

Since a recursive partitioning algorithm reduces each task to a number of tasks with smaller values and fewer parts, the same partitions with small values are generated many times. That's why techniques like memoization and tabulation can speed up the algorithm significantly.

Step 2: calculating the number of permutations

Then, find the number of permutations of each partition:

(3,0,0,0) (0,3,0,0) (0,0,3,0) (0,0,0,3) → 4
(2,1,0,0) (2,0,1,0) (2,0,0,1) (1,2,0,0) (0,2,1,0) (0,2,0,1) (1,0,2,0) (0,1,2,0) (0,0,2,1) (1,0,0,2) (0,1,0,2) (0,0,1,2) → 12
(1,1,1,0) (1,1,0,1) (1,0,1,1) (0,1,1,1) → 4

You don't need to generate all these permutations to calculate how many there are; you only have to check how many duplicate values there are in every partition generated in step 1:

(3,0,0,0) (2,1,0,0) (1,1,1,0)

Take the factorial of the number of buckets b (4!) and divide it by the product of the factorial of the number of duplicate values:

(3,0,0,0) → 4! / (1! × 3!) = 4
(2,1,0,0) → 4! / (1! × 1! × 2!) = 12
(1,1,1,0) → 4! / (3! × 1!) = 4

Step 3: calculating the number of distributions

Then, for each partition, we have to find the number of ways in which you can distribute the items in the buckets, e.g. for partition (2,1,0,0):

&bullet; first item in bucket 1, second item in bucket 1, third item in bucket 2 → (2,1,0,0)
&bullet; first item in bucket 1, second item in bucket 2, third item in bucket 1 → (2,1,0,0)
&bullet; first item in bucket 2, second item in bucket 1, third item in bucket 1 → (2,1,0,0)

This number can be calculated by taking the factorial of the number of items i (3!) and dividing it by the product of the factorial of the values (ignoring the zeros, because 0! = 1):

(3,0,0,0) → 3! / 3! = 1
(2,1,0,0) → 3! / (2! × 1!) = 3
(1,1,1,0) → 3! / (1! × 1! × 1!) = 6

Step 4: calculating the probability

Then, for each partition, multiply the number of permutations and the number of distributions, and sum these to find the total number of possibilities:

4×1 + 12×3 + 4×6 = 64

Or simply use this calculation:

buckets ^ items = 4 ^ 3 = 64

And then the probability per partition is the number of permutations times the number of distributions, divided by the total number of possibilities:

(3,0,0,0) → 4×1 / 64 = 4 / 64 = 1 / 16
(2,1,0,0) → 12×3 / 64 = 36 / 64 = 9 / 16
(3,0,0,0) → 4×6 / 64 = 24 / 64 = 6 / 16

Code example

This is an example of the basic algorithm without memoization:

function partitions(items, buckets) {
    var p = [];
    partition(items, buckets, items, []);
    probability();
    return p;

    function partition(val, parts, max, prev) {
        for (var i = Math.min(val, max); i >= val / parts; i--) {
            if (parts == 2) p.push({part: prev.concat([i, val - i])});
            else partition(val - i, parts - 1, i, prev.concat([i]));
        }
    }
    function probability() {
        var total = Math.pow(buckets, items);
        for (var i = 0; i < p.length; i++) {
            p[i].prob = permutations(p[i].part) * distributions(p[i].part) / total;
        }
    }
    function permutations(partition) {
        var dup = 1, perms = factorial(buckets);
        for (var i = 1; i < buckets; i++) {
            if (partition[i] != partition[i - 1]) {
                perms /= factorial(dup);
                dup = 1;
            } else ++dup;
        }
        return perms / factorial(dup);
    }            
    function distributions(partition) {
        var dist = factorial(items);
        for (var i = 0; i < buckets; i++) {
            dist /= factorial(partition[i]);
        }
        return dist;
    }
    function factorial(n) {
        var f = 1;
        while (n > 1) f *= n--;
        return f;
    }
}

var result = partitions(3,4);
for (var i in result) document.write(JSON.stringify(result[i]) + "<br>");
var result = partitions(7,5);
for (var i in result) document.write("<br>" + JSON.stringify(result[i]));

Additional restriction: bucket size

To adapt the results given by the algorithm to the situation where there is a maximum number of items any bucket can hold, you'd need to identify the invalid ditributions, and then redistribute their probabilities among the valid distributions, e.g.:

i=5, b=4, s=5
5,0,0,0 → 1 / 256
4,1,0,0 → 15 / 256
3,2,0,0 → 30 / 256
3,1,1,0 → 60 / 256
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=4
4,1,0,0 → 16 / 256 ← (15 + 1)
3,2,0,0 → 30 / 256
3,1,1,0 → 60 / 256
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=3
3,2,0,0 → 35.333 / 256 ← (30 + 16 × 1/3)
3,1,1,0 → 70.666 / 256 ← (60 + 16 × 2/3)
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=2
2,2,1,0 → 172.444 / 256 ← (90 + 35.333 + 70.666 × 2/3)
2,1,1,1 → 83.555 / 256 ← (60 + 70.666 × 1/3)

The diagram below shows which probabilities have to be recalculated when restricting a 7-item 3-bucket distribution to a bucket size of 4.

The algorithm for unrestricted bucket size calculates the bottom row in this diagram, without having to create the whole tree. You can adapt it to directly calculate the probability of any single distribution (see code examples below).

So you could run the original algorithm first and then, for each invalid distribution, find which distributions in the previous row lead to the invalid distribution, and redistribute the probabilities.

In the example shown in the diagram, that would mean redistributing the probabilities indicated in red, which is the sub-tree under (4,0,0) where 4 is the maximum bucket size.

So in general, to adapt the probabilities found with the original algorithm to a restricted bucket size of s, you start at (s,0,0), find all children of this distribution: (s+1,0,0) and (s,1,0), calculate their probability, take the probability of the invalid distributions and redistribute them among the valid distributions, and then move on to the next row, until you've recalculated all distributions in the final row that have the value s in them.

The code example below, based on the original algorithm, shows how to directly find the probability of any distribution with unrestricted bucket size. This would be a basic step in the new algorithm.

function probability(part) { // array of items per bucket
    var items = part.reduce(function(s, n) {return s + n}, 0);
    var buckets = part.length;
    var f = factorial(Math.max(items, buckets));
    var total = Math.pow(buckets, items);
    return permutations() * distributions() / total;

    function permutations() {
        var dup = 1, perms = f[buckets];
        for (var i = 1; i < buckets; i++) {
            if (part[i] != part[i - 1]) {
                perms /= f[dup];
                dup = 1;
            } else ++dup;
        }
        return perms / f[dup];
    }            
    function distributions() {
        return part.reduce(function(t, n) {return t / f[n]}, f[items]);
    }
    function factorial(max) {
        var f = [1];
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}

document.write(probability([2,2,1,1,1]));

The following code example uses a different notation for the partitions: instead of an array where the values are the number of items per bucket, it uses values that are the number of buckets that have index items; so [0,3,2,0] means 3 buckets with 1 item and 2 buckets with 2 items, or [2,2,2,1,1] in the regular notation.

function probability(part) { // array of buckets per amount of items
    var items = part.reduce(function(t, n, i) {return t + n * i}, 0);
    var buckets = part.reduce(function(t, n) {return t + n}, 0);
    var f = factorial(Math.max(items, buckets));
    var total = Math.pow(buckets, items);
    return permutations() * distributions() / total;

    function permutations() {
        return part.reduce(function(t, n) {return t / f[n]}, f[buckets]);
    }            
    function distributions() {
        return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n)}, f[items]);
    }
    function factorial(max) {
        var f = [1];
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}
document.write(probability([0,3,2,0,0,0,0,0]));

Below is an implementation of the complete algorithm using the alternative notation; it generates the valid distributions (taking into account the bucket size), calculates the probabilities they would have with unrestricted bucket size, and then redistributes the probabilities of the invalid distributions.

function Dictionary() {
    this.pair = {};
    this.add = function(array, value) {
        var key = array.join(',');
        if (key in this.pair) this.pair[key] += value;
        else this.pair[key] = value;
    }
    this.read = function() {
        for (var key in this.pair) {
            var prob = this.pair[key], array = key.split(',');
            array.forEach(function(v, i, a) {a[i] = parseInt(v);});
            delete this.pair[key];
            return {dist: array, prob: prob};
        }
    }
}
function partitions(items, buckets, size) {
    var dict = new Dictionary, init = [buckets], total = Math.pow(buckets, items), f = factorial();
    for (var i = 0; i < size; i++) init.push(0);
    partition(items, init, 0);
    if (size < items) redistribute();
    return dict;

    function partition(val, dist, min) {
        for (var i = min; i < size; i++) {
            if (dist[i]) {
                var d = dist.slice();
                --d[i]; ++d[i + 1];
                if (val - 1) partition(val - 1, d, i);
                else dict.add(d, probability(d));
            }
        }
        function probability(part) {
            return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n);}, f[items])
            * part.reduce(function(t, n) {return t / f[n];}, f[buckets]) / total;
        }
    }
    function redistribute() {
        var parents = new Dictionary;
        --init[0]; ++init[size];
        parents.add(init, 0);
        for (var i = size; i < items; i++) {
            var children = new Dictionary;
            while (par = parents.read()) {
                var options = buckets - par.dist[size];
                par.prob += probability(par.dist, i) * par.dist[size] / buckets;
                for (var j = 0, b = 0; j < options; j += par.dist[b++]) {
                    while (par.dist[b] == 0) ++b;
                    var child = par.dist.slice();
                    --child[b]; ++child[b + 1];
                    if (i == items - 1) dict.add(child, par.prob * par.dist[b] / options);
                    else children.add(child, par.prob * par.dist[b] / options);
                }
            }
            parents = children;
        }
        function probability(part, items) {
            return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n);}, f[items])
            * part.reduce(function(t, n) {return t / f[n];}, f[buckets]) / Math.pow(buckets, items);
        }
    }
    function factorial() {
        var f = [1], max = Math.max(items, buckets);
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}
var result = partitions(3,3,2);
for (var i in result.pair) document.write(i + " &rarr; " + result.pair[i] + "<br>");
var result = partitions(7,3,4);
for (var i in result.pair) document.write("<br>" + i + " &rarr; " + result.pair[i]);

(Below is a first idea; it turned out to be too inefficient.)

While working on this, I came up with a completely different version of the algorithm, using a dictionary. It is probably closer to what you were originally doing, but should be more efficient. Basically, I'm using a different way of storing a distribution; where I previously wrote:

(3,2,2,1,0)

for 5 buckets filled with 3, 2, 2, 1 and zero items, I now use:

(1,1,2,1)

for 1 empty bucket, 1 bucket with 1 item, 2 buckets with 2 items, and 1 bucket with 3 items. The length of this list is determined by the bucket size plus 1, and no longer by the number of buckets.

The advantage of this is that duplicates like:

(3,2,2,1,0) (3,1,2,0,2) (1,2,2,3,0) (0,1,2,2,3) ...

Now all have the same notation.

The code example below uses the dictionary idea with the new notation. (Unfortunately, it turns out to be quite slow, so I'll continue to look into improving it.)

function Dictionary() {
    this.dict = [];
}
Dictionary.prototype.add = function(array, value) {
    var key = array.join(',');
    if (this.dict[key] == undefined) this.dict[key] = value;
    else this.dict[key] += value;
}
Dictionary.prototype.print = function() {
    for (var i in this.dict) document.write(i + " &rarr; " + this.dict[i] + "<br>");
}

function probability(items, buckets, size) {
    var dict = new Dictionary, dist = [buckets];
    for (var i = 0; i < size; i++) dist.push(0);
    p(dist, 0, 1);
    return dict;

    function p(dist, value, prob) {
        var options = buckets - dist[size];
        for (var i = 0, b = 0; i < options; i += dist[b++]) {
            while (dist[b] == 0) ++b;
            var d = dist.slice();
            --d[b]; ++d[b + 1];
            if (value + 1 < items) p(d, value + 1, prob * dist[b] / options);
            else dict.add(d, prob * dist[b] / options);
        }
    }
}

probability(3, 3, 3).print();
probability(3, 3, 2).print();
probability(5, 4, 5).print();
probability(5, 4, 2).print();

Thank you m69! I appreciate your answer. I haven't yet tried to code it up to see if the complexity is better than my approach because I don't understand how to do the third step yet (i.e. "list the number of the buckets that the i items go into . . . "). If you could kindly please expand that step a bit more? — John Bent, Jun 19 '16 at 17:30
@JohnBent I'll expand this answer some more later today. For ways to do step 1 efficiently, see if this helps you: http://stackoverflow.com/questions/32907406/is-there-an-efficient-algorithm-for-integer-partitioning-with-restricted-number/32918426#32918426 (note that that case didn't use zeros, so it's a bit different). — m69's been on strike for years, Jun 19 '16 at 18:43
thanks again very much! I've actually started doing some of what I see on that other (very helpful) page but I'm using pickle instead of memo. In fact, I've sped it up a lot. Because I'm looking for data about what happens with n in b buckets and then with n+1 in b buckets. So I compute the n in buckets and pickle that. Then when I want to do the n+1, I can at least start from n. That's sped up my data generation considerably. I didn't actually know about decorators and memo actually; I wonder if that would have been better.... — John Bent, Jun 19 '16 at 21:12
I wish I'd seen this page before I did all the data-structure specific pickling myself: http://kitchingroup.cheme.cmu.edu/blog/2013/06/20/Memoizing-expensive-functions-in-python-and-saving-results/ But my comment here is actually off-topic. I've sped up my total program by using pickling but the base question about how to speed up the interior algorithm is still relevant and the topic. — John Bent, Jun 19 '16 at 21:25
@JohnBent For comparison, this JS version without any memoization takes around 10 seconds with 54 items and 34 buckets. Going by my experience with the other question I linked to, memoization could bring this down to milliseconds — m69's been on strike for years, Jun 21 '16 at 06:13
thank you! However, now I've realized I have another constraint. :) My buckets have max size. I've fixed this in partition function: for (var i = Math.min(val, max); i >= val / parts; i--) { for (var i = Math.min(val, max,bsize); i >= val / parts; i--) { I think I also need to fix this in the distributions function to get probabilities correct but I don't know how. Is that an easy fix? Also, for large b counts and a few items, the following optimization in partition can reduce recursion: else if items = 0 { self.perms.append({'perm': previous + [0]*buckets});break; } — John Bent, Jun 22 '16 at 16:16
@JohnBent I added a paragraph about the bucket size restriction, and a second code example. — m69's been on strike for years, Jun 23 '16 at 01:52
You are so awesome. Sadly however the javascript for the bucket size restriction doesn't seem to get the correct probabilities. I edited my original question to show the probability tree for items=3,buckets=3,bsize=2 and the probabilities don't match the javascript. Imagine inserting in this case into [2,0,0]. With probability=1.0, the next state will be [2,1,0] whereas I think the javascript thinks the probability of that transition is 2/3. — John Bent, Jun 23 '16 at 20:30
Did you see the probability tree in my updated question? I'm _mostly sure_ I did that correctly and it gives a different answer.... — John Bent, Jun 23 '16 at 21:28
I edited the original question again to show a visualization of the insert probabilities. As my original question states, I'm not a statistician or a mathematician so I might be totally wrong. But I am pretty sure that I got those probability trees correct and that the sum of each path's probability is the probability of each resulting state. — John Bent, Jun 23 '16 at 21:54
I updated the question to add a python simulator to test probabilities with many random iterations. It also is showing 2/9 and 7/9 for 3 buckets, 3 balls, bucket size of 2. So I'm wondering if this is some weird statistics discrepancy or if there is some other tweak possible on any of your functions so it will match the simulator. Maybe the simulator has a bug too. — John Bent, Jun 25 '16 at 05:50
@JohnBent I think you may be right; I removed the possibilities with 3's from the total, but of course their probabilities have to be re-distributed to 2,1 solutions. If you do, you indeed get 21/27 and 6/27. I'll have a look later to see if I can somehow adapt the second version of the algorithm. Good thing you kept checking instead of taking my word for it. — m69's been on strike for years, Jun 25 '16 at 12:50
That's brilliant to change the distribution array to index the count of buckets with i balls in them instead of having the array be indexing the buckets themselves. I'm still trying to reimplement what you've done in python to help myself understand it better. It does seem to be using the same basic algorithm I developed initially but using recursion instead of using a stack so the big-O complexity isn't any better like it was in the initial two algorithms you developed. But changing the array does reduce the complexity a lot already. Thanks! — John Bent, Jun 28 '16 at 05:49
@JohnBent I finally managed to code the complete algorithm correctly (the bug in the previous version was caused by the assumption that `for ... in` in JS always returned the dictionary pairs first-in first-out). I think I'm done now with this question :-) — m69's been on strike for years, Jul 11 '16 at 02:29
m69. I owe you many drinks! You are amazing. I hope you enjoyed this; it's really challenging! I got pulled into something else recently so I've been remiss to not yet try out your awesomeness. But I will and thanks again so very much!! — John Bent, Jul 12 '16 at 03:11
@JohnBent No problem. Now and then I become obsessed with a question, and I can't leave it alone until I've found a decent solution. Btw, we should probably clean things up a bit. I'll try to throw out some obsolete parts of the answer, and maybe you should delete the part with the simulator, which was only needed to convince me I was miscalculating the example anyway. — m69's been on strike for years, Jul 12 '16 at 05:14

Jim Mischel · Answer 2 · 2016-06-19T03:49:30.863

0

Update

I just realized that you probably can't use my original recommendation for larger values of n and r. At least, not without some modification. For example, if you're putting 4 items into 5 buckets, you can have (among others):

{3,1,0,0,0}
{2,2,0,0,0}

So the "n choose r" trick described below doesn't work because there are two different ways to use only two buckets.

Original answer

This is an "n choose r" problem. Combinations.

For the {3,0,0,0} case, you're asking "how many different ways can I choose one item from a set of four?" The answer is four. If the set contains the digits 1,2,3,4, then your combinations are {1},{2},{3},{4}.

For the {2,1,0,0} case, you're asking how many different ways to choose two items from four. The answer is six: {1,2},{1,3},{1,4},{2,3},{2,4},{3,4}.

And for the {1,1,1,0} case, the answer is nine. I won't enumerate them.

The "n choose r" calculation is C(n,r) = n! / ( r! (n - r)! )

See http://www.calculatorsoup.com/calculators/discretemathematics/combinations.php for more information.

edited Jun 19 '16 at 03:49

answered Jun 19 '16 at 03:44

Jim Mischel

131,090
20
188
351

Thank you Jim for the answer! However, as you point out, this trick sadly doesn't work. First of all, it doesn't work for the reason you point out. Also, this formula only returns the number of possible sets. Whereas I'm trying to find the membership of each set **plus** the probability of that set occurring..... – John Bent Jun 19 '16 at 03:54
@JohnBent: If you note, in the case of 3 items in 4 buckets, the number of sets implies the membership in the set. That is, there's only one was to use two buckets. Which leads me to believe that there's a recursive solution that uses this trick. Will have to think about its implementation. – Jim Mischel Jun 19 '16 at 14:50
If you think something up for its implementation, that would be wonderful! My larger problem is actually first using this algorithm to distribute items into buckets and then for each bucket/item combo, to repeat it for subbuckets. So the computational complexity is killing me. Any optimizations would make a big difference. Right now I'm struggling with 54 items and 34 buckets for example. – John Bent Jun 19 '16 at 17:36

possible distributions and their probabilities after putting identical items into anonymous buckets

2 Answers2

Update

Original answer

Linked