My problem is this: I want to find all n
length combinations of m
possible digits such that the average of the digits is greater than a threshold value X
.
For example, say length n=3
and digits are {1, 2}
and threshold is 1.5
. Total allowable combinations are 2*2*2 == 2**3 = 8
viz
222 - avg 2.000000 > 1.500000 -> include in acceptable set
221 - avg 1.666667 > 1.500000 -> include in acceptable set
212 - avg 1.666667 > 1.500000 -> include in acceptable set
211 - avg 1.333333 < 1.500000 -> adding 1 and below to the exclude list
122 - avg 1.666667 > 1.500000 -> include in acceptable set
121 - avg 1.333333 < 1.500000 -> skipping this vote combo
112 - avg 1.333333 < 1.500000 -> skipping this vote combo
111 - avg 1.000000 < 1.500000 -> skipping this vote combo
final list of valid votecombos
[[2, 2, 2], [2, 2, 1], [2, 1, 2], [1,2,2]]
The way, I think, to solve this problem is to imagine a tree of all possible combinations and then dynamically prune the tree for impossible solutions. For example imagine n=3
level tree like this
root
/ \
1 2
/ \ / \
1 2 1 2
/ \ / \ / \ / \
1 2 1 2 1 2 1 2
Each path to a leaf is a possible combination. As you can imagine for levels n=3
and m=5
, the number of nodes is N == m**n == 5**3 == 125' nodes. Its easy to see that the tree gets really really large even for
m=5and
n=20` approx. 96 trillion nodes. So the tree cannot be stored in memory. But it also doesnt have to be since it is very structured.
The way to get all possible valid combinations is by DFS traversing the tree in a kind of pre-order way but also keep on pruning the tree at the same time as you traverse. For example, in the above example, the first three combinations {222, 221, 212}
are valid however 211 is not valid. It also means that any other combination containing two 1's from that point on is not going to be valid. So we can pretty much prune the whole left part of the tree with root 1 except 122 ! This would have helped us avoid checking the 3 combinations.
To do this I wrote a simple python script
import string
import itertools
import numpy as np
import re
chars = '21'
grad_thr = 1.5
seats = 3
excllist = []
validlist = []
for word in itertools.product(chars, repeat = seats):
# form the string of digits
votestr = ''.join(word)
print (votestr)
# convert string into list of chars
liststr = list(votestr)
#print liststr
# map list of chars to list of ints
listint = map(int, liststr)
if len(list(set(listint) & set(excllist))) == 0:
# if there are no excluded votes in this votecombo then proceed
# compute a function over the digits; func can be average/bayesian score/something else.
y_mean = np.mean(listint)
print 'avg %f' %y_mean
#y_bayes = bayesian score
if y_mean >= grad_thr:
# if function result is greater than grad threshold then save result to a list of valid votes
validlist.append(listint)
print 'geq than %f -> include in acceptable set' %grad_thr
elif y_mean < grad_thr:
# if function result is not greater than grad threshold then add logic to stop searching the tree further
# prune unnecessary part of the tree
if listint[-1] not in excllist:
excllist = [int(d) for d in range(listint[-1] + 1)]
print 'adding %d and below to the exclude list' %listint[-1]
else:
print '%d already present in exclude list' %listint[-1]
else:
print 'skipping this vote combo'
print '\nfinal valid list of votecombos'
print validvotelist
print 'exclude list'
print excllist
print '\n'
In this way I go through every possible combo and skip to aviod computing the average. However I still end up checking each possible combo after I have entered the for-loop.
Is it possible to not check the combo at all? i.e. we know combo 121
does not work, however we still have to enter the for loop and then skip the combo. Is it possible to not do that?