2

I am trying to get all possible ways to distribute n candies among k children. For instance, according to stars-and-bars formula, number of ways to distribute 96 candies among 5 children is 100! / (96!*4!) = 3 921 225 tuples of all possible permutations of size 5.

list2 = [item for item in it.product(range(97), repeat = 5)
             if sum(item) == 96]

My PC seems to overwhelmed by complexity. Each tuple consumes 24*5 = 120 bytes of memory. This results in 921 225 * 120 = 470547000 bytes or 450 mb. Doesn't seem that much. Why is PC so slow at generating this list? What am I missing?

Prune
  • 76,765
  • 14
  • 60
  • 81
user9695260
  • 347
  • 3
  • 17
  • 4
    That math doesn't add up for me. 100 factorial is way more massive than 4 million. – BlackVegetable Nov 30 '18 at 16:59
  • 4
    First of all, what are you planning to do with this list2? Because there are better ways to run this operation without running out of memory, such as using generators, or yield in a loop, as you can read in more details here: https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do – Luan Naufal Nov 30 '18 at 17:00
  • @BlackVegetable I don't know why you say that - his maths is fine. 100 factorial is huge but so is 96 factorial, in fact we can do this maths on paper. 100!/96! = (100*99*98*...)/(96*95*...) = 100*99*98*97 which is approximately 10^8, divide that by 4! = 24 which gives us around 4*10^6. Checks out with me! – Adam Dadvar Nov 30 '18 at 17:02
  • 1
    `it.product(range(97), repeat = 5)` produces 97^5 = 8 587 340 257 values. That takes some time. – zvone Nov 30 '18 at 17:03
  • 4
    @AdamDadvar His question doesn't have any division in it. – Maximilian Burszley Nov 30 '18 at 17:03
  • @TheIncorrigible1 He did make a typo, but it's pretty obvious what he meant – Adam Dadvar Nov 30 '18 at 17:06
  • @AdamDadvar As someone not very familiar with advanced mathematics, I had no idea what he was referencing and was trusting his question would be accurately asked. – Maximilian Burszley Nov 30 '18 at 17:06
  • Sorry, I wasn't familiar with the equation/algorithm. @AdamDadvar So I was taking what I thought was a reasonable guess. That seems like a typo you could correct, knowing the math far better than I do, right? – BlackVegetable Nov 30 '18 at 17:09
  • 1
    Take a look at the fixed width partition code at the end of [my answer](https://stackoverflow.com/a/40220262/4014959). – PM 2Ring Nov 30 '18 at 17:12
  • Please define your problem better: "number of ways" is not clear. Are the candies interchangeable? Are the children interchangeable? If both, then this is merely the `partition problem`, and should be closed as a duplicate. Regardless of the clarification, your set-up is incorrect. Clarify, and we'll get it fixed. – Prune Nov 30 '18 at 17:21

2 Answers2

1

I see two problems with your math.

First, you're describing a combination there. Effectively, you're thinking (96 choose 5), which doesn't cover all permutations.

Second, the permutation would actually be 96!/91!, which is several orders of magnitude higher than ~4 million.

Just by adding the byte count, you're in the high gigabyte range of memory usage now, which could explain why your machine is slowing down; the memory usage alone generated from this could crush most modern consumer machines.

Makoto
  • 104,088
  • 27
  • 192
  • 230
  • 2
    This is assuming no caching is happening, which I would hope their `it` module is handling. – Maximilian Burszley Nov 30 '18 at 17:08
  • 1
    @TheIncorrigible1: I *believe* under normal circumstances the `iterator` module does generate records, but since it's being eagerly referenced here by the list comprehension, any benefit from that caching would likely be minimal at best. – Makoto Nov 30 '18 at 17:12
1

Here is one way to make your approach work. It uses itertools.combinations. It takes a few seconds to build the complete list. For a faster, numpy based approach see bottom of this post.

It works by enumerating all combinations of four bars between 1 and 100, always adding the outer bars 0 and 101. The allocations for the five kids are then what's between the bars, i.e. the diff of the bars minus one.

import numpy as np
import itertools


bars = [0, 0, 0, 0, 0, 101]
result = [[bars[j+1] - bars[j] - 1 for j in range(5)] for bars[1:-1] in itertools.combinations(range(1, 101), 4)]

# sanity check
len(result)
# 3921225
# show few samples
from pprint import pprint
pprint(result[::400000])
# [[0, 0, 0, 0, 96],
#  [2, 26, 12, 8, 48],
#  [5, 17, 22, 7, 45],
#  [8, 23, 30, 16, 19],
#  [12, 2, 73, 9, 0],
#  [16, 2, 25, 40, 13],
#  [20, 29, 24, 0, 23],
#  [26, 13, 34, 14, 9],
#  [33, 50, 4, 5, 4],
#  [45, 21, 26, 1, 3]]

Why does yours not work so well? I think mostly because your loop is a bit wasteful, 97^5 is quite a bit larger than 100 choose 4.

If you want it really fast, you can replace itertools.combinations with a numpy version:

https://stackoverflow.com/a/42202157/7207392

def fast_comb(n, k):
    a = np.ones((k, n-k+1), dtype=int)
    a[0] = np.arange(n-k+1)
    for j in range(1, k):
        reps = (n-k+j) - a[j-1]
        a = np.repeat(a, reps, axis=1)
        ind = np.add.accumulate(reps)
        a[j, ind[:-1]] = 1-reps[1:]
        a[j, 0] = j
        a[j] = np.add.accumulate(a[j])
    return a

fb = fast_comb(100, 4)
sb = np.empty((6, fb.shape[1]), int)
sb[0], sb[1:5], sb[5] = -1, fb, 100
result = np.diff(sb.T) - 1

result.shape
# (3921225, 5)
result[::400000]
# array([[ 0,  0,  0,  0, 96],
#        [ 2, 26, 12,  8, 48],
#        [ 5, 17, 22,  7, 45],
#        [ 8, 23, 30, 16, 19],
#        [12,  2, 73,  9,  0],
#        [16,  2, 25, 40, 13],
#        [20, 29, 24,  0, 23],
#        [26, 13, 34, 14,  9],
#        [33, 50,  4,  5,  4],
#        [45, 21, 26,  1,  3]])

This takes about one second.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99