Return the all possible key combinations where the cumulative sum is equal to input value

Question

For input data = {"a": 1, "b": 1, "c": 3}, I would like to obtain the dict that looks like

{1: [["a"], ["b"]], 
 2: [["a", "b"]], 
 3: [["c"]], 
 4: [["a", "c"], ["b", "c"]],
 5: [["a", "b", "c"]]}

where the key is all possible cumulative sum of the values of input dictionary and corresponding values are the list of all keys make the cumulative sum possible.

The difficulty here is that there could be multiple solutions to get the same cumulative sum (the above 1 is an example). Now I know I could do

{v: list(data.keys())[:idx+1] for v, idx in zip(np.cumsum(list(data.values())), range(len(data)))}

But this approach does not give complete solution I want. Concretely, for input data, I could only get

{1: ['a'], 
 2: ['a', 'b'], 
 5: ['a', 'b', 'c']}

And when there are more duplicates in data, the problem could become more difficult to resolve.

I am wondering if there is good solution to this problem (this problem seems to be NP-complete from my computation theory course).

Have you looked at the ittertools functions? This question may help - https://stackoverflow.com/questions/464864/how-to-get-all-possible-combinations-of-a-list-s-elements — Tom McLean, May 04 '21 at 15:09

score 1 · Answer 1 · answered May 04 '21 at 15:09

You can use a recursive generator function to get all combinations of the dictionary keys, and the use collections.defaultdict to group the combinations on the sums they produce:

from collections import defaultdict
data = {"a": 1, "b": 1, "c": 3}
def to_sum(d, c = []):
   if c:
      yield c
   if len(d) > len(c):
      yield from [i for b in data for i in to_sum(d, c+[b]) if b not in c]

result = defaultdict(set)
for i in to_sum(data):
   result[sum(data[j] for j in i)].add(tuple(sorted(i)))

final_result = {a:list(map(list, b)) for a, b in result.items()}

Output:

{1: [['b'], ['a']], 2: [['a', 'b']], 5: [['a', 'b', 'c']], 4: [['b', 'c'], ['a', 'c']], 3: [['c']]}

I think this has a runtime of `O(N * log(N) * N ** N)` where `N = len(data)`. — Jasmijn, May 04 '21 at 15:29

score 1 · Accepted Answer · answered May 04 '21 at 15:45

Your problem is related to the power set of data. To be specific, the power set excluding the empty set. We can use itertools from the standard library to help us out.

from collections import defaultdict
from itertools import chain, combinations

def nonempty_powerset(data):
    # adapted from the powerset recipe
    # at https://docs.python.org/3/library/itertools.html
    # start the range from 1 to exclude the empty set
    return chain.from_iterable(combinations(data, r) for r in range(1, len(data) + 1))

def get_subset_sums(data):
    output = defaultdict(list)
    for subset in nonempty_powerset(data):
        # subset is a tuple of strings, if you want lists we need to convert them
        output[sum(data[item] for item in subset)].append(list(subset))
    # convert to a regular dict
    return dict(output)

print(get_subset_sums({"a": 1, "b": 1, "c": 3}))

The runtime should be in O(N * 2 ** N) where N = len(data), which is pretty reasonable for something that involves power sets. Maybe there is a numpy method I missed that could help you with this but otherwise I think this is reasonably close to optimal.

Return the all possible key combinations where the cumulative sum is equal to input value

2 Answers2