lazily iterate over generator in multiprocessing pool

Question

I generate data using a generator (this data is memory intensive, although it is not the case in this dummy example) and then I have to make some calculations over that data. Since these calculations take much longer than data generation, I wish to do them in parallel. Here is the code I wrote (with dummy functions for simplicity):

from math import sqrt
from multiprocessing import Pool

def fibonacci(number_iter):
    i = 0
    j = 1
    for round in range(number_iter):
        yield i
        k = i + j
        i, j = j, k


def factors(n):
    f = set()
    for i in range(1, n+1, 1):
        if n % i == 0:
            f.add(i)
    return f


if __name__ == "__main__":
    pool = Pool()
    results = pool.map(factors, fibonacci(45))

I learnt from other questions (see here and here) that map consumes the iterator fully. I wish to avoid that because that consumes a prohibitive amount of memory (that is why I am using a generator in the first place!).

How can I do this by lazily iterating over my generator function? The answers in the questions mentioned before have not been of help.

Could you do `for result in pool.imap(factors, fibonacci(45)): print(results)` ? It iterates lazily over results. — Andrej Kesely, Jun 05 '20 at 20:50
I second @AndrejKesely's suggestion. See https://stackoverflow.com/questions/26520781/multiprocessing-pool-whats-the-difference-between-map-async-and-imap/26521507#26521507 for more info on `imap` vs `map` behavior. — dano, Jun 06 '20 at 04:26
This works, but raises one question: the generator function is much faster than `factors`. Is there a way to pause the generator function so it won't overload memory when there are a certain number of elements generated and waiting to be consumed by the `factors` function? — YamiOmar88, Jun 06 '20 at 07:46
@YamiOmar88 Look at [`Semaphore`](https://docs.python.org/3.8/library/multiprocessing.html#multiprocessing.Semaphore) The `fibonacci()` will acquire the semaphore and `factors()` will release it. The value in semaphore constructor determines when it will block. — Andrej Kesely, Jun 06 '20 at 09:26

lazily iterate over generator in multiprocessing pool

0 Answers0