Efficiently count sets in a cartesian product that sum above a specific number

Question

I have the below Python 3 code that works:

import itertools

loops = 10
results = [4, 2.75, 2.75, 1.5, 1.5, 1.5, 0]
threshold = loops * 2
cartesian_product = itertools.product(results, repeat=loops)

good, bad = 0, 0

for e in cartesian_product:
    if (sum(e) >= threshold):
        good += 1
    else:
        bad += 1

print('Ratio of good vs total is {0:.3f}%'.format(100 * good / (good + bad)))

If I increase the loops to a larger number (>15), the program takes too long to finish.

Is there a more efficient way/algorithm to calculate the ratio?

There are a lot of repeated lists in iteration. Is it accounted? — bzimor, Jun 28 '17 at 08:29
You can use a list comprehension to get a list of sums, convert to numpy array, use numpy where to get array of indices greater than threshold, and finally use len() to get number of sums above/below threshold. (Typing on mobile..) — , Jun 28 '17 at 08:43
My computer takes too long to run your code as is because of `cartesian_product = itertools...`; the [answers](https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/35608701#35608701) in this [post](https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363) seem like they will help. — , Jun 28 '17 at 10:10
Cross-posted: https://cs.stackexchange.com/q/77321/755, https://stackoverflow.com/q/44796997/781723. Please [do not post the same question on multiple sites](https://meta.stackexchange.com/q/64068). Each community should have an honest shot at answering without anybody's time being wasted. — D.W., Jun 28 '17 at 21:43

score 4 · Accepted Answer · answered Jun 28 '17 at 11:52

Here is a solution. The idea is to calculate all possible sums of our values you can obtain with n loops, counting the different possible sums, and counting together all sums that are larger than the threshold.

Then, we can generate all possible sums for n+1 loops by adding our values to the previous sums. We can hope that the number of different possible sums won't grow too large, as we add many times the same values, and we regroup all sums larger than the threshold.

from collections import Counter

def all_sums(values, threshold, previous_sums = None):
    """
    values must be sorted
    previous_sums is a Counter of previously obtained possible sums

    Returns a Counter of all possible sums of values and the previous sums
    """
    if not previous_sums:
        previous_sums = Counter({0:1})

    new = Counter()
    for existing_sum, ex_sum_count in sorted(previous_sums.items()):
        for index, val in enumerate(values):
            total = existing_sum + val
            if total < threshold:
                # With the current value, we have found ex_sum_count
                # ways to obtain that total
                new.update({total: ex_sum_count})
            else:
                # We don't need the exact sum, as anything we could
                # later add to it will be over the threshold.
                # We count them under the value = threshold
                # As 'values' is sorted, all subsequent values will also give 
                # a sum over the threshold
                values_left = len(values) - index
                new.update({threshold: values_left * ex_sum_count})
                break
    return new


def count_sums(values, threshold, repeat):
    """
    values must be sorted!

    Recursively calculates the possible sums of 'repeat' values,
    counting together all values over 'threshold'
    """
    if repeat == 1:
        return all_sums(values, threshold, previous_sums=None)
    else:
        return all_sums(values, threshold, previous_sums=count_sums(values, threshold, repeat=repeat-1))

Let's try it on your example:

loops = 10
results = [4, 2.75, 2.75, 1.5, 1.5, 1.5, 0]
threshold = loops * 2

values = sorted(results)

sums = count_sums(values, threshold, repeat=loops)
print(sums)
# Counter({20: 137401794, 19.75: 16737840, 18.25: 14016240, 18.5: 13034520, 19.5: 12904920,
# 17.0: 12349260, 15.75: 8573040, 17.25: 8048160, 15.5: 6509160, 16.75: 6395760, 14.25: 5171040,
# 18.0: 5037480, 14.5: 4461480, 16: 3739980, 18.75: 3283020, 19.25: 3220800, 13.0: 3061800, 
# 14.0: 2069550, 12.75: 1927800, 15.25: 1708560, 13.25: 1574640, 17.5: 1391670, 11.5: 1326780,
# 11.75: 1224720, 14.75: 1182660, 16.5: 1109640, 10.25: 612360, 17.75: 569520, 11.25: 453600, 
# 16.25: 444060, 12.5: 400680, 10.0: 374220, 12: 295365, 13.75: 265104, 10.5: 262440, 19.0: 229950,
# 13.5: 204390, 8.75: 204120, 15.0: 192609, 9.0: 153090, 8.5: 68040, 9.75: 65520, 7.5: 61236, 
# 7.25: 45360, 11.0: 44940, 12.25: 21840, 6.0: 17010, 7.0: 7560, 5.75: 6480, 8.25: 5280, 4.5: 3240,
# 9.5: 2520, 10.75: 720, 4.25: 540, 5.5: 450, 3.0: 405, 6.75: 180, 8: 45, 1.5: 30, 2.75: 20, 4: 10, 0: 1})
number_of_sums = len(results) ** loops
# 282475249
good = sums[threshold]
# 137401794
bad = number_of_sums - good
# 145073455

I timed it, it takes about 9 ms on my rather old machine.

And with some other data: 10 different values, 20 loops:

loops = 20
results = [4, 2.75, 2.45, 1.5, 1.3, 0.73, 0.12, 1.4, 1.5, 0]
threshold = loops * 2
values = sorted(results)

sums = count_sums(values, threshold, repeat=loops)
number_of_sums = len(results) ** loops
good = sums[threshold]
bad = number_of_sums - good
print(good)
print(bad)
# 5440943363190360728
# 94559056636809639272

which I obtain in less than 12 seconds.

Efficiently count sets in a cartesian product that sum above a specific number

1 Answers1

Linked