For some problem [proven to be NP hard] I have no other option but exhaustive search. I have a set of data — for simplicity, S = ['A', 'B', 'C', ... ,'Z']
and want to apply a function f
to all subsets of length N < len(S)
of this set. I cannot use lists here since binomial coefficients binom(len(S),N)
are some billions. But the result of f(x), x∈S
is zero for almost all the values of S
. Therefore in simple cases all works great with
from itertools import ifilter, combinations
answer = list(ifilter(lambda x: f(x) > 0, combinations(S,N)))
But in real life, len(S) ~ 10⁴
and N ~ 10²
. What I want is to spread the work among CPU engines using ipyparallel
. I have a small cluster with a hundred of CPU cores. But I still cannot afford to store combinations as lists, therefore I need something like separate generators.
There are a couple of examples of how to split generator into chunks, but as far as I understand they are still consecutive generators. There is also an idea of @minrk that is related but it performs really bad for some reason.
So the questions are:
- is there any way to implement
itertools.ifilter
directly withipyparallel
? or - is it possible to separate python generator into a set of independent generators (to send them to
ipcluster
engines independently)?