2

I am trying to generate an array of all combinations of an arbitrary number of arrays. From the generated array, I would like to add an constraint that the sum of the numbers must lie between two bounds (say 'lower' and 'upper')

One method of doing this is to use cartersian, sum the elements, and select the ones that fall within the lower and upper bounds. However, the main limitation is that it is possible to run out of memory given a large number of input arrays. Another method is to use itertools.product:

import itertools
import numpy as np

def arraysums(arrays,lower,upper):
    p = itertools.product(*arrays)
    r = list()

    for n in p:
        s = sum(n)
        if lower <= s <= upper:
            r.append(n)

    return r

N = 8
a = np.arange(N)
b = np.arange(N)-N/2

arraysums((a,b),lower=5,upper=6)

which returns a result like this:

[(2, 3),
 (3, 2),
 (3, 3),
 (4, 1),
 (4, 2),
 (5, 0),
 (5, 1),
 (6, -1),
 (6, 0),
 (7, -2),
 (7, -1)]

This method is memory efficient, but can be very slow if the arrays are large, such as this example which runs in the 10's of minutes:

a = np.arange(32.)
arraysums(6*(a,),lower=10,upper=20)

I'm looking for a faster method.

Community
  • 1
  • 1
justin
  • 171
  • 8

2 Answers2

3

You could use recursion. For example, if item has been selected from the first array, then the new lower and upper limits for the rest of the arrays should be lower-item and upper-item.

The main advantage here is that you can short-circuit the enumeration of tuples at each stage. Consider the case when all the values are positive. Then we can automatically throw out any value in the other arrays that is bigger than upper-item. This intelligently reduces the size of the search space at each level of the recursion.

import itertools

def arraysums_recursive_all_positive(arrays, lower, upper):
    # Assumes all values in arrays are positive
    if len(arrays) <= 1:
        result = [(item,) for item in arrays[0] if lower <= item <= upper]
    else:
        result = []
        for item in arrays[0]:
            subarrays = [[item2 for item2 in arr if item2 <= upper-item] 
                      for arr in arrays[1:]]
            if min(len(arr) for arr in subarrays) == 0:
                continue
            result.extend(
                [(item,)+tup for tup in arraysums_recursive_all_positive(
                    subarrays, lower-item, upper-item)])
    return result

def arraysums(arrays,lower,upper):
    p = itertools.product(*arrays)
    r = list()

    for n in p:
        s = sum(n)
        if lower <= s <= upper:
            r.append(n)

    return r

a = list(range(32))

For this test case, arraysums_recursive_all_positive is over 688x faster than arraysums:

In [227]: %time arraysums_recursive_all_positive(6*(a,),lower=10,upper=20)
CPU times: user 360 ms, sys: 8.01 ms, total: 368 ms
Wall time: 367 ms

In [73]: %time arraysums(6*(a,),lower=10,upper=20)
CPU times: user 4min 8s, sys: 0 ns, total: 4min 8s
Wall time: 4min 8s

In the general case, when the values in arrays may be negative, then we can add an appropriate amount to each value in arrays to guarantee that all the values in the new arrays is positive. We can also adjust the lower and upper limits to account for this shifting of values. Thus we can reduce the general problem to the special case of arrays with all positive values:

def arraysums_recursive(arrays, lower, upper):
    minval = min(item for arr in arrays for item in arr)
    # Subtract minval from arrays to guarantee all the values are positive
    arrays = [[item-minval for item in arr] for arr in arrays]
    # Adjust the lower and upper bounds accordingly
    lower -= minval*len(arrays)
    upper -= minval*len(arrays)
    result = arraysums_recursive_all_positive(arrays, lower, upper)
    # Readjust the result by adding back minval
    result = [tuple([item+minval for item in tup]) for tup in result]
    return result

Notice that arraysums_recursive handles negative values correctly, while arraysums_recursive_all_positive does not:

In [312]: arraysums_recursive([[10,30],[20,40],[-35,-40]],lower=10,upper=20)
Out[312]: [(10, 40, -35), (10, 40, -40), (30, 20, -35), (30, 20, -40)]

In [311]: arraysums_recursive_all_positive([[10,30],[20,40],[-35,-40]],lower=10,upper=20)
Out[311]: []

While arraysums_recursive is slower than arraysums_recursive_all_positive,

In [37]: %time arraysums_recursive(6*(a,),lower=10,upper=20)
CPU times: user 1.03 s, sys: 0 ns, total: 1.03 s
Wall time: 852 ms

it is still 290x faster than arraysums.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • This is a great improvement in speed! I'm trying to [modify it](http://stackoverflow.com/questions/40429466/summing-all-possible-combinations-of-an-arbitrary-number-of-arrays-and-applying) to also return the indices as well. – justin Nov 04 '16 at 19:43
1

Here's a vectorized approach making use of NumPy broadcasting -

def arraysums_vectorized(arrays,lower,upper):
    a,b = arrays
    sums = a[:,None] + b
    r,c = np.nonzero((lower <= sums) & (sums <= upper))
    return np.column_stack((a[r], b[c]))

Runtime test -

In [2]: # Inputs
   ...: N = 800
   ...: a = np.arange(N)
   ...: b = np.arange(N)-N/2
   ...: 

In [3]: l = 500
   ...: u = 600
   ...: out1 = arraysums((a,b),lower=l,upper=u)
   ...: out2 = arraysums_vectorized((a,b),lower=l,upper=u)
   ...: print np.allclose(out1,out2)
   ...: 
True

In [4]: %timeit arraysums((a,b),lower=l,upper=u)
1 loops, best of 3: 508 ms per loop

In [5]: %timeit arraysums_vectorized((a,b),lower=l,upper=u)
100 loops, best of 3: 7.11 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358