1

Consider a 1D numpy array and two constants as shown:

import numpy as np

arr = np.arange(60)

n = 5
s = 120

arr is always of the form [0,1,2,3,4, ... 59,60] for example.

QUESTION: From a 1D array (arr), I need to find all subsets exactly n UNIQUE elements that have a specified sum s. A solution could start like:

          [[2, 10, 26, 35, 47], 
           [9, 14, 15, 40, 42],
           etc...

I have shown the row elements in order. This would be nice, but it is not required. (ie: combinations, not permutations)

Currently, I handle this computation in an SQL variant by using a cross-product of identical tables, each holding the elements of arr. This works, but is VERY slow, especially if n gets as large as 12 or so.

Is there a way to efficiently and quickly do this in Python/Numpy?

user109387
  • 600
  • 3
  • 11
  • Is `arr` always an array from 0 to some maximum value, in increments of 1? Thus no repeated elements, always incrementing by 1, starting at 0 (or perhaps starting at 1). That can simplify the calculation. – 9769953 Mar 03 '21 at 07:08
  • Yes. ‘arr’ is always of the form [0,1,2,3,4, ... 59,60] for example. I’ll clarify that in the question. – user109387 Mar 03 '21 at 12:24
  • why the emphasis on `UNIQUE`? `np.arange(k)` is always unique – Gulzar Mar 03 '21 at 12:52
  • @Gulzar: one could repeat elements from `arr`: `[24, 24, 24, 24, 24]` fulfills the requirements, *except* for the unique factor. – 9769953 Mar 03 '21 at 16:32

2 Answers2

1

A recursive solution follows.

This is solvable in o(len(arr)^(max(n, len(arr)-n).

I will come back to this...

However, the following solution will be still much faster than yours.

import numpy as np


def perms(arr: np.array, n: int, s: int, used_inds: set):
    if s == 0 and n == 0:
        print(arr[np.array(list(used_inds))])
        return

    if s == 0:  # not enough elements in group
        return

    if n == 0:  # not reached sum too soon [all positive]
        return

    for i in range(len(arr)):
        if i in used_inds:
            continue
        new_used_inds = set(used_inds)
        new_used_inds.add(i)
        perms(arr=arr, n=n - 1, s=s - arr[i], used_inds=new_used_inds)


def main():
    arr = np.arange(20)

    n = 3
    s = 15

    perms(arr=arr, n=n, s=s, used_inds=set())


if __name__ == "__main__":
    main()
[ 0  1 14]
[ 0  2 13]
[ 0  3 12]
[ 0 11  4]
[ 0 10  5]
[0 9 6]
[0 8 7]
[0 8 7]
[0 9 6]
[ 0 10  5]
[ 0 11  4]
[ 0  3 12]
[ 0  2 13]
[ 0  1 14]
[ 0  1 14]
[ 1  2 12]
[11  1  3]
[ 1 10  4]
[1 5 9]
[8 1 6]
[8 1 6]
[1 5 9]
[ 1 10  4]
[ 3  1 11]
[ 1  2 12]
[ 0  2 13]
[ 1  2 12]
[10  2  3]
[9 2 4]
[8 2 5]
[2 6 7]
[2 6 7]
[8 2 5]
[9 2 4]
[ 3  2 10]
[ 1  2 12]
[ 0  3 12]
[11  1  3]
[10  2  3]
[8 3 4]
[3 5 7]
[3 5 7]
[8 3 4]
[ 2 10  3]
[11  1  3]
[ 0 11  4]
[ 1 10  4]
[9 2 4]
[8 3 4]
[4 5 6]
[4 5 6]
[8 3 4]
[9 2 4]
[ 1 10  4]
[ 0 10  5]
[1 5 9]
[8 2 5]
[3 5 7]
[4 5 6]
[4 5 6]
[3 5 7]
[8 2 5]
[9 5 1]
[0 9 6]
[8 1 6]
[2 6 7]
[4 5 6]
[4 5 6]
[2 6 7]
[8 1 6]
[0 8 7]
[2 6 7]
[3 5 7]
[3 5 7]
[2 6 7]
[8 0 7]
[8 1 6]
[8 2 5]
[8 3 4]
[8 3 4]
[8 2 5]
[8 1 6]
[0 9 6]
[9 5 1]
[9 2 4]
[9 2 4]
[9 5 1]
[ 0 10  5]
[ 1 10  4]
[ 3 10  2]
[ 2 10  3]
[ 1 10  4]
[ 0 11  4]
[ 3  1 11]
[ 3  1 11]
[ 0  3 12]
[ 1  2 12]
[ 1  2 12]
[ 0  2 13]
[ 0  1 14]

This is still slow, but performs MUCH LESS computation than your proposed solution, because it cuts off branches where more computation is already known to be invaluable.

Notice this makes order a bit less readable.


A bit less readable code with a bit more readable output:

import numpy as np
from collections import OrderedDict


def perms(arr: np.array, n: int, s: int, used_inds: OrderedDict):
    if s == 0 and n == 0:
        print(arr[np.array(list(used_inds))])
        return

    if s == 0:  # not enough elements in group
        return

    if n == 0:  # not reached sum too soon [all positive]
        return

    for i in range(len(arr)):
        if i in used_inds:
            continue
        new_used_inds = OrderedDict(used_inds)
        new_used_inds[i] = None
        perms(arr=arr, n=n - 1, s=s - arr[i], used_inds=new_used_inds)


def main():
    arr = np.arange(20)

    n = 3
    s = 15

    perms(arr=arr, n=n, s=s, used_inds=OrderedDict())


if __name__ == "__main__":
    main()
[ 0  1 14]
[ 0  2 13]
[ 0  3 12]
[ 0  4 11]
[ 0  5 10]
[0 6 9]
[0 7 8]
[0 8 7]
[0 9 6]
[ 0 10  5]
[ 0 11  4]
[ 0 12  3]
[ 0 13  2]
[ 0 14  1]
[ 1  0 14]
[ 1  2 12]
[ 1  3 11]
[ 1  4 10]
[1 5 9]
[1 6 8]
[1 8 6]
[1 9 5]
[ 1 10  4]
[ 1 11  3]
[ 1 12  2]
[ 2  0 13]
[ 2  1 12]
[ 2  3 10]
[2 4 9]
[2 5 8]
[2 6 7]
[2 7 6]
[2 8 5]
[2 9 4]
[ 2 10  3]
[ 2 12  1]
[ 3  0 12]
[ 3  1 11]
[ 3  2 10]
[3 4 8]
[3 5 7]
[3 7 5]
[3 8 4]
[ 3 10  2]
[ 3 11  1]
[ 4  0 11]
[ 4  1 10]
[4 2 9]
[4 3 8]
[4 5 6]
[4 6 5]
[4 8 3]
[4 9 2]
[ 4 10  1]
[ 5  0 10]
[5 1 9]
[5 2 8]
[5 3 7]
[5 4 6]
[5 6 4]
[5 7 3]
[5 8 2]
[5 9 1]
[6 0 9]
[6 1 8]
[6 2 7]
[6 4 5]
[6 5 4]
[6 7 2]
[6 8 1]
[7 0 8]
[7 2 6]
[7 3 5]
[7 5 3]
[7 6 2]
[8 0 7]
[8 1 6]
[8 2 5]
[8 3 4]
[8 4 3]
[8 5 2]
[8 6 1]
[9 0 6]
[9 1 5]
[9 2 4]
[9 4 2]
[9 5 1]
[10  0  5]
[10  1  4]
[10  2  3]
[10  3  2]
[10  4  1]
[11  0  4]
[11  1  3]
[11  3  1]
[12  0  3]
[12  1  2]
[12  2  1]
[13  0  2]
[14  0  1]

Your solution is equivalent to

from itertools import combinations


def combs(arr: np.array, n: int, s: int):
    comb_generator = combinations(iterable=arr, r=n)
    for comb in comb_generator:
        total = sum(list(comb))
        if total == s:
            print(comb)
(0, 1, 14)
(0, 2, 13)
(0, 3, 12)
(0, 4, 11)
(0, 5, 10)
(0, 6, 9)
(0, 7, 8)
(1, 2, 12)
(1, 3, 11)
(1, 4, 10)
(1, 5, 9)
(1, 6, 8)
(2, 3, 10)
(2, 4, 9)
(2, 5, 8)
(2, 6, 7)
(3, 4, 8)
(3, 5, 7)
(4, 5, 6)

which does not cut off calculations on time but just iterates over everything.
This one at least does not allocate extra memory.

Gulzar
  • 23,452
  • 27
  • 113
  • 201
  • 1
    Vastly faster than what I was doing! The calculation cut off notion is really instructive and useful in many other settings. – user109387 Mar 03 '21 at 15:33
  • I noticed that the proposed solutions yield all permutations. For example, (8, 17, 25) will appear in 6 forms. The question mentioned that order is not important, so could we just get the outputs that are in ascending order? Thanks – user109387 Mar 03 '21 at 16:16
  • 1
    @user109387 I already invested more time than I have into this. To achieve what you want, just put all the solutions into a set - it automatically makes sure there are no duplicates. Notice `set` can only handle hashable types, so it should work for `int` (and thus `int` arrays) but not for `float`. This can be handled, but I didn't bother as this isn't the main issue here. Or, you can just accumulate the results and then filter duplicates. I'll quote my hated lecturers: "I'll leave that as an exercise to the reader". Sorry :) – Gulzar Mar 03 '21 at 16:27
  • 1
    Thinking for 10 more seconds, a better way would be to use the cutoff notion together with permutations. see https://docs.python.org/3/library/itertools.html#itertools.combinations and modify the code a bit to only pass over each combination once, and still cutoff when you have reached the sum. Would be even more efficient than the solution I proposed. I don't have time to do that right now, sorry. I'm sure you will be able to. – Gulzar Mar 03 '21 at 16:32
  • Thank you. I'll look into that. I'd noticed that when arr=np.arange(9), n=3, s=10, there are 72 solutions given. The 7 desired combinations are there: ie (0,2,8), (0,3,7), (0,4,6),(1,2,7), (1,3,6), (1,4,5), (2,3,5), but so are all permutations and some doubles as well (eg 127) when the 2nd code response is used. – user109387 Mar 03 '21 at 16:47
  • 1
    @user109387 Please notice the `itertools` solution is of polynomial time (o(len(arr)^(max(n, len(arr)-n) == [len(arr) choose n]). A polynomial time solution that also uses backtracking will come when I have time. – Gulzar Mar 07 '21 at 09:37
  • Added https://stackoverflow.com/a/66525175/913098 as a different answer because it is a different approach completely. – Gulzar Mar 08 '21 at 06:24
1

Following from this answer, I modified the code to fit this problem.
Notice the array is fed to the function in descending order.

def combinations_cutoff(array, tuple_length, s, prev_array=[], n_skips=[]):
    if len(prev_array) == tuple_length:
        if sum(prev_array) == s:
            return [prev_array]
        return []
    combs = []
    for i, val in enumerate(array):
        prev_array_extended = prev_array.copy()
        prev_array_extended.append(val)
        if sum(prev_array_extended) > s:
            n_skips[0] += 1
            print(f"cutoff! arr={prev_array_extended}, s={sum(prev_array_extended)}, skip #{n_skips[0]}")
            continue
        combs += combinations_cutoff(array[i+1:], tuple_length, s, prev_array_extended, n_skips=n_skips)
    return combs


def main():
    n = 20
    k = 3
    s = 15
    arr = np.arange(n)[::-1]

    n_skips = [0]
    gen = combinations_cutoff(arr, k, s, n_skips=n_skips)

    lst = []
    for c in gen:
        lst.append(c)
        print(c)

    print(f"total solutions: {len(lst)}, skips={n_skips[0]}")


if __name__ == "__main__":
    main()

Output:

cutoff! arr=[19], s=19, skip #1
cutoff! arr=[18], s=18, skip #2
cutoff! arr=[17], s=17, skip #3
cutoff! arr=[16], s=16, skip #4
cutoff! arr=[15, 14], s=29, skip #5
cutoff! arr=[15, 13], s=28, skip #6
cutoff! arr=[15, 12], s=27, skip #7
cutoff! arr=[15, 11], s=26, skip #8
cutoff! arr=[15, 10], s=25, skip #9
cutoff! arr=[15, 9], s=24, skip #10
cutoff! arr=[15, 8], s=23, skip #11
cutoff! arr=[15, 7], s=22, skip #12
cutoff! arr=[15, 6], s=21, skip #13
cutoff! arr=[15, 5], s=20, skip #14
cutoff! arr=[15, 4], s=19, skip #15
cutoff! arr=[15, 3], s=18, skip #16
cutoff! arr=[15, 2], s=17, skip #17
cutoff! arr=[15, 1], s=16, skip #18
cutoff! arr=[14, 13], s=27, skip #19
cutoff! arr=[14, 12], s=26, skip #20
cutoff! arr=[14, 11], s=25, skip #21
cutoff! arr=[14, 10], s=24, skip #22
cutoff! arr=[14, 9], s=23, skip #23
cutoff! arr=[14, 8], s=22, skip #24
cutoff! arr=[14, 7], s=21, skip #25
cutoff! arr=[14, 6], s=20, skip #26
cutoff! arr=[14, 5], s=19, skip #27
cutoff! arr=[14, 4], s=18, skip #28
cutoff! arr=[14, 3], s=17, skip #29
cutoff! arr=[14, 2], s=16, skip #30
cutoff! arr=[13, 12], s=25, skip #31
cutoff! arr=[13, 11], s=24, skip #32
cutoff! arr=[13, 10], s=23, skip #33
cutoff! arr=[13, 9], s=22, skip #34
cutoff! arr=[13, 8], s=21, skip #35
cutoff! arr=[13, 7], s=20, skip #36
cutoff! arr=[13, 6], s=19, skip #37
cutoff! arr=[13, 5], s=18, skip #38
cutoff! arr=[13, 4], s=17, skip #39
cutoff! arr=[13, 3], s=16, skip #40
cutoff! arr=[13, 2, 1], s=16, skip #41
cutoff! arr=[12, 11], s=23, skip #42
cutoff! arr=[12, 10], s=22, skip #43
cutoff! arr=[12, 9], s=21, skip #44
cutoff! arr=[12, 8], s=20, skip #45
cutoff! arr=[12, 7], s=19, skip #46
cutoff! arr=[12, 6], s=18, skip #47
cutoff! arr=[12, 5], s=17, skip #48
cutoff! arr=[12, 4], s=16, skip #49
cutoff! arr=[12, 3, 2], s=17, skip #50
cutoff! arr=[12, 3, 1], s=16, skip #51
cutoff! arr=[11, 10], s=21, skip #52
cutoff! arr=[11, 9], s=20, skip #53
cutoff! arr=[11, 8], s=19, skip #54
cutoff! arr=[11, 7], s=18, skip #55
cutoff! arr=[11, 6], s=17, skip #56
cutoff! arr=[11, 5], s=16, skip #57
cutoff! arr=[11, 4, 3], s=18, skip #58
cutoff! arr=[11, 4, 2], s=17, skip #59
cutoff! arr=[11, 4, 1], s=16, skip #60
cutoff! arr=[11, 3, 2], s=16, skip #61
cutoff! arr=[10, 9], s=19, skip #62
cutoff! arr=[10, 8], s=18, skip #63
cutoff! arr=[10, 7], s=17, skip #64
cutoff! arr=[10, 6], s=16, skip #65
cutoff! arr=[10, 5, 4], s=19, skip #66
cutoff! arr=[10, 5, 3], s=18, skip #67
cutoff! arr=[10, 5, 2], s=17, skip #68
cutoff! arr=[10, 5, 1], s=16, skip #69
cutoff! arr=[10, 4, 3], s=17, skip #70
cutoff! arr=[10, 4, 2], s=16, skip #71
cutoff! arr=[9, 8], s=17, skip #72
cutoff! arr=[9, 7], s=16, skip #73
cutoff! arr=[9, 6, 5], s=20, skip #74
cutoff! arr=[9, 6, 4], s=19, skip #75
cutoff! arr=[9, 6, 3], s=18, skip #76
cutoff! arr=[9, 6, 2], s=17, skip #77
cutoff! arr=[9, 6, 1], s=16, skip #78
cutoff! arr=[9, 5, 4], s=18, skip #79
cutoff! arr=[9, 5, 3], s=17, skip #80
cutoff! arr=[9, 5, 2], s=16, skip #81
cutoff! arr=[9, 4, 3], s=16, skip #82
cutoff! arr=[8, 7, 6], s=21, skip #83
cutoff! arr=[8, 7, 5], s=20, skip #84
cutoff! arr=[8, 7, 4], s=19, skip #85
cutoff! arr=[8, 7, 3], s=18, skip #86
cutoff! arr=[8, 7, 2], s=17, skip #87
cutoff! arr=[8, 7, 1], s=16, skip #88
cutoff! arr=[8, 6, 5], s=19, skip #89
cutoff! arr=[8, 6, 4], s=18, skip #90
cutoff! arr=[8, 6, 3], s=17, skip #91
cutoff! arr=[8, 6, 2], s=16, skip #92
cutoff! arr=[8, 5, 4], s=17, skip #93
cutoff! arr=[8, 5, 3], s=16, skip #94
cutoff! arr=[7, 6, 5], s=18, skip #95
cutoff! arr=[7, 6, 4], s=17, skip #96
cutoff! arr=[7, 6, 3], s=16, skip #97
cutoff! arr=[7, 5, 4], s=16, skip #98
[14, 1, 0]
[13, 2, 0]
[12, 3, 0]
[12, 2, 1]
[11, 4, 0]
[11, 3, 1]
[10, 5, 0]
[10, 4, 1]
[10, 3, 2]
[9, 6, 0]
[9, 5, 1]
[9, 4, 2]
[8, 7, 0]
[8, 6, 1]
[8, 5, 2]
[8, 4, 3]
[7, 6, 2]
[7, 5, 3]
[6, 5, 4]
total solutions: 19, skips=98

The function always returns a list of all combinations of length tuple_length, and that start with prev_array and end with a combination of array, and that sum up to s.

At each step, we consider if prev_array is "ready" (=of correct length and sum). If so, it should be returned.
If it is the sum wasn't reached, and the array is already long enough, then it is invalid.

Now, we are left with a prev_array that isn't long enough, and we consider which element should be added to it right now.

The options for elements are as defined by the recursion, all of array's elements, thus we iterate over them and try them out.
When an element is being tried out, only elements following it can be tried later.
Trying out an element means calling the recursion with a prev_array_extended that is the prev_array followed by the selected element.

Now to the main dish - the cutoff - BEFORE calling the recursion, we can consider prev_array's sum. Since array is guaranteed to be of descending order [notice how it is called from main], if any prefix of a candidate tuple has passed s, then there is no point moving on with checking possibilities for that candidate, and we can stop and move on to smaller values to push to prev_array's end.

n_skips is just an ugly hack to count how many times a cutoff was done.

Gulzar
  • 23,452
  • 27
  • 113
  • 201