How to iterate over the set of arrays with a fixed sum

Question

Consider all arrays of l non-negative integers in the range 0,...,m. I would like to iterate (using a generator) over only those whose sum is exactly s.

For example, take l=7, s=5, m=4, the iteration could look like:

(0, 0, 0, 0, 0, 1, 4)
(0, 0, 0, 0, 0, 2, 3)
(0, 0, 0, 0, 0, 3, 2)
(0, 0, 0, 0, 0, 4, 1)
(0, 0, 0, 0, 1, 0, 4)
(0, 0, 0, 0, 1, 1, 3)
(0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 1, 3, 1)
(0, 0, 0, 0, 1, 4, 0)
[...]
(3, 2, 0, 0, 0, 0, 0)
(4, 0, 0, 0, 0, 0, 1)
(4, 0, 0, 0, 0, 1, 0)
(4, 0, 0, 0, 1, 0, 0)
(4, 0, 0, 1, 0, 0, 0)
(4, 0, 1, 0, 0, 0, 0)
(4, 1, 0, 0, 0, 0, 0)

I don't mind which order the iteration happens in but I would like it to be efficient.

A really dumb way to reproduce the above that is far too slow for larger values of the variables is:

import itertools
s = 5
l = 7
m = 5
for arr in itertools.product(range(m), repeat=l):
    if sum(arr) == s:
        print(arr)

What are "larger values of the variables"? Answer may depend on that. — dankal444, Nov 07 '21 at 16:28
@dankal444 If we double all the variables then the naïve method isn't usable. Hopefully it's possible to build an iterator which runs quickly (per output) no matter what the size of the variables is. — Simd, Nov 07 '21 at 16:30
My approach would first to find all the combinations that reach `s=5` (e.g. `[4, 1], [3, 2], [3, 1, 1]`, etc.), then you could speed up the building of the array by only using the valid combinations. My though is it will reduce by much the number of call to the `sum`. — pyOliv, Nov 07 '21 at 16:33
possible duplicate https://stackoverflow.com/questions/7748442/generate-all-possible-lists-of-length-n-that-sum-to-s-in-python — dankal444, Nov 07 '21 at 16:37
@dankal444 Don't think that it is a duplicate since that question has 2 parameters and this has 3 — John Coleman, Nov 07 '21 at 16:43
You are right, here is constraint that values can't be larger than `m`. — dankal444, Nov 07 '21 at 16:46

Brad Martsberger · Accepted Answer · 2021-11-08T05:22:58.327

Think of the problem this way, you want to put s balls in l buckets with no more than m balls in any one bucket.

Since I know how to add one ball at a time, my instinct is to solve this using recursion. The base case is putting 0 balls instead of s and to go from one step to the next, adding 1 ball to each of the buckets that currently have less than m balls in them.

To make sure it's actually possible to complete the recursion, we first check there is enough places to put the balls.

# this helper function tells us the last non zero index in an array
def last_non_zero(arr):
  indx = -1
  while arr[indx] == 0 and indx > -len(arr):
    indx -= 1
  return len(arr) + indx


def balls_in_buckets(num_balls, num_buckets, max_balls):
  assert num_buckets * max_balls >= num_balls, f"You can't put {num_balls} balls in {num_buckets} buckets without more than {max_balls} in a bucket." 

  if num_balls == 0:
    yield ([0]*num_buckets).copy()

  else:
    for array in balls_in_buckets(num_balls - 1, num_buckets, max_balls):
      for bucket_number in range(last_non_zero(array), num_buckets):
        if array[bucket_number] < max_balls:
          array_copy = array.copy()
          array_copy[bucket_number] += 1
          yield array_copy

Edit: Added code to remove duplicates

Edit: Performance improvement, takes about 2 seconds to generate the whole sequence for l=14, s=10, m=8. There are 1,143,870 items in the sequence.

mine is ~2x faster, but am pretty sure this still is not even close to as fast as we could get — dankal444, Nov 07 '21 at 17:10
Can you explain why this is so much faster than the other methods? I tested it with balls_in_buckets(10,14,8) — Simd, Nov 11 '21 at 17:27

dankal444 · Answer 2 · 2021-11-07T20:47:32.997

1

By modyfing this answer, taking into account max_value:

def sums(length, total_sum, max_value):
    if length == 1:
        yield (total_sum,)
    else:
        for value in range(max(0, total_sum - (length - 1) * max_value),
                               min(max_value, total_sum) + 1):
            for permutation in sums(length - 1, total_sum - value, max_value):
                yield (value,) + permutation

L = list(sums(7,5, 4))
print('total permutations:',len(L))

# First and last 10 of list
for i in L[:10] + ['...'] + L[-10:]:
    print(i)

total permutations: 455
(0, 0, 0, 0, 0, 1, 4)
(0, 0, 0, 0, 0, 2, 3)
(0, 0, 0, 0, 0, 3, 2)
(0, 0, 0, 0, 0, 4, 1)
(0, 0, 0, 0, 1, 0, 4)
(0, 0, 0, 0, 1, 1, 3)
(0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 1, 3, 1)
(0, 0, 0, 0, 1, 4, 0)
(0, 0, 0, 0, 2, 0, 3)
...
(3, 1, 0, 0, 1, 0, 0)
(3, 1, 0, 1, 0, 0, 0)
(3, 1, 1, 0, 0, 0, 0)
(3, 2, 0, 0, 0, 0, 0)
(4, 0, 0, 0, 0, 0, 1)
(4, 0, 0, 0, 0, 1, 0)
(4, 0, 0, 0, 1, 0, 0)
(4, 0, 0, 1, 0, 0, 0)
(4, 0, 1, 0, 0, 0, 0)
(4, 1, 0, 0, 0, 0, 0)

edited Nov 07 '21 at 20:47

answered Nov 07 '21 at 16:56

dankal444

3,172
1
23
35

I don't think this handles the edge case where total > max_value*length correctly. – Acccumulation Nov 07 '21 at 18:06
%timeit list(sums(14,10,8)) gives me 12 seconds. I don't know if that is the best way to time it. – Simd Nov 07 '21 at 18:48
@Anush timeit is fine, just when comparing to other guys code make sure the arguments are the same. For example, Brad Martsberger answer has different order of arguments than mine and Acccumulation answers – dankal444 Nov 07 '21 at 20:52
1

Added a performance improvement to my version. Interestingly enough our two different approaches appear to produces the results in the opposite order. Probably not just a coincidence. – Brad Martsberger Nov 08 '21 at 05:27
Sped up to 9 seconds. I wonder the other method is so much faster? – Simd Nov 11 '21 at 17:25

score 1 · Answer 3 · answered Nov 07 '21 at 17:55

What you are looking for are called "partitions". Unfortunately, there's some ambiguity as to whether "partitions" refers to splitting a set into partitions (e.g. [a,b,c] into [[a,b],[c]]), just the numbers characterizing size of each split (e.g. [2,1]), or the count of how many different splitting there are. The most promising search terms I found were "partition a number into k parts python", yielding Python Integer Partitioning with given k partitions and "python partition of indistinguishable items" yielding Partition N items into K bins in Python lazily . Those answers focus on partitions with at least one element, while you allow partitions to include zero elements. In addition, you seem to care about the order within a partition. For instance, you list (0, 0, 0, 0, 0, 1, 4), (0, 0, 0, 0, 0, 4, 1), and (0, 0, 0, 0, 1, 0, 4) as distinct partitions, while traditionally those would be considered equivalent.

I think the best way is to iterate through the buckets, and for each one iterate through the possible values. I changed the parameter names; l, s, and m and not very informative names, and "sum" and "max" are built-in functions. My current version, which may need more debugging:

def get_partitions(length, total, upper_bound):
    if length == 1:
        if total > upper_bound:
            return []
        return [[total]]
    if total == 0:
        return [[0]*length]                    
    return [ [n]+sub_partition for 
               n in range(min(total, upper_bound)+1) for 
                   sub_partition in get_partitions(
                       length-1, total-n, upper_bound)]

Side note: I initially read "iterate over the arrays" as meaning "go through the elements of the array". I think that the proper terminology is "iterate over the set of such arrays". When you say "iterate over x", x is being treated as the iterable, not the elements of the iterable.

Can this be turned into a generator so that you can iterate over the set of arrays even if it is very large? — Simd, Nov 07 '21 at 18:45

How to iterate over the set of arrays with a fixed sum

3 Answers3