I assume the general case where not each id has all values in [0.01, 1.0].
There are 3 main optimisations you can make and they all aim to instantly drop branches that are guaranteed to not satisfy your conditions.
1. Split the ratios of each id in a dictionary
This way you instantly avoid pointless combinations, .e.g., [{'id': 1, 'ratio': 0.01}, {'id': 1, 'ratio': 0.02}]
. It also makes it easier to try combinations between ids. So, instead of having everything in a flat list of dicts, reorganise the data in the following form:
# if your ids are 0-based consecutive integer numbers, a list of lists works too
array_list = {
1: [0.01, 0.02, ...],
2: [0.01, 0.02, ...],
3: [...],
}
2. For an N-size pairing, you have N-1 degrees of freedom
If you're searching for a triplet and you already have (0.54, 0.33, _), you don't have to search all possible values for the last id. There is only one that can satisfy the condition sum(ratios) == 1.0
.
3. You can further restrict the possible value range of each id based on the min/max values of the others.
Say you have 3 ids and they all have all the values in [0.01, 0.44]. It is pointless to try any combinations for (0.99, _, _), because the minimum sum for the last two ids is 0.02. Therefore, the maximum value that the first id can explore is 0.98 (well, 0.44 in this example but you get my drift). Similarly, if the maximum sum of the last two ids is 0.88, there is no reason to explore values below 0.12 for the first id. A special case of this is where the sum of the minimum value of all ids is more than 1.0 (or the max < 1.0), in which case you can instantly drop this combination and move on.
Using integers instead of floats
You are blessed in dealing only with some discrete values, so you're better off converting everything to integers. The first reason is to avoid any headaches with floating arithmetic. Case in point, did you know that your code misses some combinations exactly due to these inaccuracies?
And since you will be generating your own value ranges due to optimisation #3, it's much simpler to do for i in range(12, 99)
than some roundabout way to generate all values in [0.12, .99) while making sure everything is properly rounded off at the second decimal digit AND THEN properly added together and checked against some tolerance value close to 1.0.
Code
from collections import defaultdict
import itertools as it
def combined_sum(values):
def _combined_sum(values, comb, partial_sum, depth, mins, maxs):
if depth == len(values) - 1:
required = 100 - partial_sum
if required in values[-1]:
yield comb + (required,)
else:
min_value = mins[depth+1]
max_value = maxs[depth+1]
start_value = max(min(values[depth]), 100 - partial_sum - max_value)
end_value = max(1, 100 - partial_sum - min_value)
for value in range(start_value, end_value+1):
if value in values[depth]:
yield from _combined_sum(values, comb+(value,), partial_sum+value, depth+1, mins, maxs)
# precompute all the partial min/max sums, because we will be using them a lot
mins = [sum(min(v) for v in values[i:]) for i in range(len(values))]
maxs = [sum(max(v) for v in values[i:]) for i in range(len(values))]
if mins[0] > 100 or maxs[0] < 100:
return []
return _combined_sum(values, tuple(), 0, 0, mins, maxs)
def asset_allocation(array_list, max_size):
# a set is preferred here because we will be checking a lot whether
# a value is in the iterable, which is faster a set than in a tuple/list
collection = defaultdict(set)
for d in array_list:
collection[d['id']].add(int(round(d['ratio'] * 100)))
all_combos = []
for i in range(1, max_size+1):
for ids in it.combinations(collection.keys(), i):
values = [collection[ID] for ID in ids]
for group in combined_sum(values):
all_combos.append([{'id': ID, 'ratio': ratio/100} for ID, ratio in zip(ids, group)])
return all_combos
array_list = [{'id': ID, 'ratio': ratio/100}
for ID in (1, 2, 3, 4, 5)
for ratio in range(1, 101)
]
max_size = 5
result = asset_allocation(array_list, max_size)
This finishes in 14-15 seconds on my machine.
For comparison, for 3 ids this finishes in 0.007 seconds and Gabor's solution which effectively implements only optimisation #1 finishes in 0.18 seconds. For 4 ids it's .43 s and 18.45 s respectively. For 5 ids I stopped timing his solution after a few minutes, but it was expected to take at least 10 minutes.
If you are dealing with the case where all ids have all the values in [0.01, 1.0] and you insist on having the specific output indicated in your question, the above approach is still optimal. However, if you are okay with generating the output in a different format, you can do better.
For a specific group size, e.g., singles, pairs, triplets, etc, generate all the partitions that add up to 100 using the stars and bars approach. That way, instead of generating (1, 99), (2, 98), etc, for each pair of ids, i.e., (1, 2), (1, 3) and (2, 3), you do this only once.
I've modified the code from here to not allow for 0 in any partition.
import itertools as it
def partitions(n, k):
for c in it.combinations(range(1, n), k-1):
yield tuple(b-a for a, b in zip((0,)+c, c+(n,)))
def asset_allocation(ids, max_size):
all_combos = []
for k in range(1, max_size+1):
id_comb = tuple(it.combinations(ids, k))
p = tuple(partitions(100, k))
all_combos.append((id_comb, p))
return all_combos
ids = (1, 2, 3, 4, 5)
result = asset_allocation(ids, 5)
This finishes much faster, takes up less space, and also allows you to home in to all the combinations for singles, pairs, etc, individually. Now, if you were to take the product of id_comb
and p
to generate the output in your question, you'd lose all that time saved. In fact, it'd come out as a biiit slower than the general method from above, but at least this piece of code is still more compact.