Filter a Lazy Sequence in Parallel

Question

I have a problem in which I am searching for numbers with certain properties in a very large search space (possibly infinite, but definitely too large for the whole space to fit in memory). So I want a lazy sequence, which I filter. My naive approach was to use list comprehension (for) for the entire search, but this executes the search in a single thread. There are some really easy ways to prune the search space, and there are also some parts of the search that are more computationally intensive.

My slightly less naive approach was to put the easy pruning in the for expression, and to make a search function that did the harder work. Then: (filter search (for [..... :when (prune)])). There's a filter function in the reducers library, but that won't work on lazy seqs. I can't convert from a lazy seq, because of the memory constraint.

So, what is a good way to filter a lazy sequence in parallel? My last naive approach would be something like sticking the sequence in an atom:

(defn accessor-gen [lazys]
  (let [s (atom [nil lazys])]
    (fn []
      (first (swap! s (fn [[_ s]] [(first s) (rest s)]))))))

Then I could have a thread pool of six or so using this function to search the space.

Question: I have the uneasy feeling that I'm making this harder than it needs to be. Also, I'm concerned about contention on the atom. Is there a more straightforward method to consume a lazy sequence in parallel? Finally, is my entire approach fundamentally flawed? Is there a better way, perhaps one that doesn't require lazy sequences?

Is it similar to this one? http://stackoverflow.com/questions/2602791/clojure-agents-consuming-from-a-queue — rwong, Oct 12 '14 at 20:40
Similar, but not quite the same. The solutions there mostly involve, if I understand them correctly, processing a network queue as a sequence, with side-effects. If processing the data was all I needed, I could use `pmap`. I want to filter my data, but there's no `pfilter`. I know how to build *a* solution from core concepts I do know, but I fear that I'm over-complicating it. It seems like it's probably a common enough task, though. — galdre, Oct 12 '14 at 21:04

DanLebrero · Accepted Answer · 2014-10-13T20:12:09.573

1

First thing that I would try is to filter a pmapped seq:

(defn search [i]
        (println (Thread/currentThread) i)
        (when (zero? (rem i 10))
          i))

(take 10 (filter identity (pmap search (range))))

The filtering would happen in a single thread, but the search will be computed in parallel.

If what you actually want to do in parallel is the filtering, you will need to partition the lazy seq and concat the results:

(defn search [numbers]
      (doall (filter (fn [i] (zero? (rem i 10))) numbers))) 

(take 10 (apply concat (pmap search (partition-all 1000 (range)))))

edited Oct 13 '14 at 20:12

answered Oct 12 '14 at 23:03

DanLebrero

8,545
1
29
30

I think your first answer is what I'm looking for. The second wouldn't work if the lazy sequence is infinite. Just checking: even if the entire space being searched is too big for the JVM instance, the laziness of both filter and pmap here mean that the original sequence never occurs completely in memory, right? – galdre Oct 13 '14 at 11:48
Both solutions are lazy as `concat` is also lazy. I have updated the answer to show it. – DanLebrero Oct 13 '14 at 20:14
Oh. I was thinking `partition` was not lazy. And therein lay the root of the problem, where I got off-track. Thanks! – galdre Oct 13 '14 at 20:25

Filter a Lazy Sequence in Parallel

1 Answers1