I am using the 'stars-and-bars' algorithm to select items from multiple lists, with the number of stars between the bars k and k+1 being the index in the k'th list. The problem I'm facing is that the partitions (i.e. the number of stars between two bars) can be larger than the size of a list, which will result in many invalid combinations.
For example: if I have two lists each of length 8, (14,0)
is a valid stars distribution for sum=14, but will of course exceed the first list's capacity. (7,7)
is the highest valid index - so I get a large number of invalid indices, especially if the lists are not of equal size.
For performance reasons I need a variant of the algorithm with limited partition size. How can I do this? The stars-and-bars implementation I'm using right now is this one, but I can easily change it. The lists are usually of similar length, but not necessarily of the same length. I'm fine with limiting the partition sizes to the length of the longest list, but individual restrictions would of course be nicer.
import itertools
def stars_and_bars(stars, partitions):
for c in itertools.combinations(range(stars+partitions-1), partitions-1):
yield tuple(right-left-1 for left,right in zip((-1,) + c, c + (stars+partitions-1,)))
def get_items(*args):
hits = 0
misses = 0
tries = 0
max_idx = sum(len(a) - 1 for a in args)
for dist in range(max_idx):
for indices in stars_and_bars(dist, len(args)):
try:
tries += 1
[arg[i] for arg,i in zip(args,indices)]
hits += 1
except IndexError:
misses += 1
continue
print('hits/misses/tries: {}/{}/{}'.format(hits, misses, tries))
# Generate 4 lists of length 1..4
lists = [[None]*(r+1) for r in range(4)]
get_items(*lists)
# hits/misses/tries: 23/103/126
Edit: I found two related questions on mathexchange, but I was not able to translate them into code yet: