6

Consider the following example:

import itertools
import numpy as np

a = np.arange(0,5)
b = np.arange(0,3)
c = np.arange(0,7)

prods = itertools.product(a,b,c)

for p in prods:
    print(p)

This iterate over the products in the following order:

(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 0, 3)
(0, 0, 4)
(0, 1, 0)

But I would much rather have the products given in order of their sum, e.g.

(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 1, 1)
(1, 0, 1)
(1, 1, 0)
(0, 0, 2)

How can I achieve this without storing all combinations in memory?

Note: a b and c are always ranges, but not necessarily with the same maximum. There is also no 2nd-level ordering when the sums of two products are equal, i.e. (0,1,1) is equivalent to (2,0,0).

Thomas Wagenaar
  • 6,489
  • 5
  • 30
  • 73
  • You can't do that [with `product`](https://docs.python.org/3/library/itertools.html#itertools.product), which deliberately orders them as shown. This would be quite hard to solve in general and require at least _some_ data to be held in memory, unless you can guarantee ordered inputs with matching steps. – jonrsharpe Sep 28 '21 at 13:53
  • @jonrsharpe but perhaps there is a way to map one ordering to the other? that's something I was looking at but without success – Thomas Wagenaar Sep 28 '21 at 13:54
  • Are `a,b,c` always ranges? – Dani Mesejo Sep 28 '21 at 13:54
  • @DaniMesejo yes! – Thomas Wagenaar Sep 28 '21 at 13:55
  • @ThomasWagenaar to do that you'd need to consume then sort the whole series, which might require impractical amounts of storage. What I'm saying is that, _depending on the specific context_, there may be ways to keep this in the iterator world. – jonrsharpe Sep 28 '21 at 13:55
  • Are a, b, and c always the same range? Edit: Also, are they ever ranges with strides larger than 1? – kcsquared Sep 28 '21 at 14:06
  • @kcsquared no, but good one, i'll add it it to question – Thomas Wagenaar Sep 28 '21 at 14:07
  • For the sake of providing an answer, if memory consumption isn't an issue this can done by iterating over `sorted(prods, key=lambda t: (sum(t), t.count(0)))`. – jfaccioni Sep 28 '21 at 14:10
  • Are you looking for an ordering to tiebreak equal sums, or is any ordering fine? This can be done, but probably not using itertools – kcsquared Sep 28 '21 at 14:14
  • @kcsquared no there is no preference for ordering in the tiebreak case, so (0,0,2) and (1,1,0) are 'equal' – Thomas Wagenaar Sep 28 '21 at 14:16
  • Are the strides always 1? – Dani Mesejo Sep 28 '21 at 15:22

2 Answers2

3

If the steps are always 1 and avoiding storing all combinations is your top priority, you could do the following (partially using itertools.product):

import itertools
import numpy as np

def weak_compositions(boxes, balls, parent=tuple()):
    """https://stackoverflow.com/a/36748940/4001592"""
    if boxes > 1:
        for i in range(balls + 1):
            for x in weak_compositions(boxes - 1, i, parent + (balls - i,)):
                yield x
    else:
        yield parent + (balls,)


def verify_limits(x, minimum, maximum):
    all_max = all(xi <= li for xi, li in zip(x, maximum))
    all_min = all(xi >= li for xi, li in zip(x, minimum))
    return all_max and all_min


def iterate_in_sum(ranges):
    prods = itertools.product(*ranges)

    # find number of different sums
    unique_total_sums = sorted(set(sum(p) for p in prods))

    # find the minimum limits
    minimum_allowed = [min(r) for r in ranges]

    # find the maximum limits
    maximum_allowed = [max(r) for r in ranges]

    for total_sum in unique_total_sums:
        # decompose each sum into its summands
        for x in weak_compositions(len(ranges), total_sum):

            # if the decomposition meets the limits
            if verify_limits(x, minimum_allowed, maximum_allowed):
                yield x


a = np.arange(0, 5)
b = np.arange(0, 3)
c = np.arange(0, 7)

for s in iterate_in_sum([a, b, c]):
    print(s, sum(s))

Output (partial)

(0, 0, 0) 0
(1, 0, 0) 1
(0, 1, 0) 1
(0, 0, 1) 1
(2, 0, 0) 2
(1, 1, 0) 2
(1, 0, 1) 2
(0, 2, 0) 2
(0, 1, 1) 2
(0, 0, 2) 2
(3, 0, 0) 3
(2, 1, 0) 3
(2, 0, 1) 3
(1, 2, 0) 3
(1, 1, 1) 3
(1, 0, 2) 3
(0, 2, 1) 3
(0, 1, 2) 3

The core of the solution is the weak_compositions function that will decompose a number into it's summands (something like integer partition). More solutions to the above problem of composition of n into k parts can be found here.

Note:

The solution can be extended to non uniform steps with additional complexity cost.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • I'm curious about why `for total_sum in unique_total_sums:` works to produce the sums in order from lowest to highest, given that `unique_total_sums` is a set. Is that a coincidence for this input, or do Python sets preserve insertion order now? – kcsquared Sep 28 '21 at 16:05
  • @kcsquared this explain it https://stackoverflow.com/questions/45581901/are-sets-ordered-like-dicts-in-python3-6/ – Dani Mesejo Sep 28 '21 at 16:53
  • In any case I just added sorted to make it explicit – Dani Mesejo Sep 28 '21 at 16:57
  • Ah thank you, that's a very useful link. Apparently the fact that small positive ints hash to themselves means sets are usually sorted; from the link, not using sorted would cause this to fail on inputs like `a = np.arange(5, 8), b = np.arange(1, 3)` – kcsquared Sep 28 '21 at 17:07
3

The easiest way to do this without storing extra products in memory is with recursion. Instead of range(a,b), pass in a list of (a,b) pairs and do the iteration yourself:

def prod_by_sum(range_bounds: List[Tuple[int, int]]):
    """
    Yield from the Cartesian product of input ranges, produced in order of sum.

    >>> range_bounds = [(2, 4), (3, 6), (0, 2)]
    >>> for prod in prod_by_sum(range_bounds):
    ...    print(prod)
    (2, 3, 0)
    (2, 3, 1)
    (2, 4, 0)
    (3, 3, 0)
    (2, 4, 1)
    (2, 5, 0)
    (3, 3, 1)
    (3, 4, 0)
    (2, 5, 1)
    (3, 4, 1)
    (3, 5, 0)
    (3, 5, 1)

    """
    def prod_by_sum_helper(start: int, goal_sum: int):
        low, high = range_bounds[start]
        if start == len(range_bounds) - 1:
            if low <= goal_sum < high:
                yield (goal_sum,)
            return

        for current in range(low, min(high, goal_sum + 1)):
            yield from ((current,) + extra
                        for extra in prod_by_sum_helper(start + 1, goal_sum - current))

    lowest_sum = sum(lo for lo, hi in range_bounds)
    highest_sum = sum(hi - 1 for lo, hi in range_bounds)

    for goal_sum in range(lowest_sum, highest_sum + 1):
        yield from prod_by_sum_helper(0, goal_sum)

which has output for range_bounds = [(0, 5), (0, 3), (0, 7)] starting with:

(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(1, 0, 0)
(0, 0, 2)
(0, 1, 1)
(0, 2, 0)
(1, 0, 1)
(1, 1, 0)
(2, 0, 0)

You can do this exact process iteratively by modifying a single list and yielding copies of it, but the code either becomes more complicated or less efficient.

You can also trivially modify this to support steps besides 1, however that does work less efficiently with larger and larger steps since the last range might not contain the element needed to produce the current sum. That seems unavoidable, because at that point you'd need to solve a difficult computational problem to efficiently loop over those products by sum.

kcsquared
  • 5,244
  • 1
  • 11
  • 36