itertools.ifilter with IPython Parallel

Question

For some problem [proven to be NP hard] I have no other option but exhaustive search. I have a set of data — for simplicity, S = ['A', 'B', 'C', ... ,'Z'] and want to apply a function f to all subsets of length N < len(S) of this set. I cannot use lists here since binomial coefficients binom(len(S),N) are some billions. But the result of f(x), x∈S is zero for almost all the values of S. Therefore in simple cases all works great with

   from itertools import ifilter, combinations
   answer = list(ifilter(lambda x: f(x) > 0, combinations(S,N)))

But in real life, len(S) ~ 10⁴ and N ~ 10². What I want is to spread the work among CPU engines using ipyparallel. I have a small cluster with a hundred of CPU cores. But I still cannot afford to store combinations as lists, therefore I need something like separate generators.

There are a couple of examples of how to split generator into chunks, but as far as I understand they are still consecutive generators. There is also an idea of @minrk that is related but it performs really bad for some reason.

So the questions are:

is there any way to implement itertools.ifilter directly with ipyparallel? or
is it possible to separate python generator into a set of independent generators (to send them to ipcluster engines independently)?

score 1 · Answer 1 · answered Aug 18 '16 at 17:59

Exhaustive search is utterly hopeless here, no matter how you parallelize it. With len(S) and N at such high orders of magnitude, you would need to search through about 6e241 solution candidates. This is far beyond the capacity of any computational system humanity could ever hope to build.

You will need to use a smarter algorithm.

itertools.ifilter with IPython Parallel

1 Answers1