1

I have a list of large generators like the following:

test_list = [(i for i in range(100000000)) for x in range(100)]

This is much larger than mine, but demonstrates reason for generator.

I want to evaluate a function on each generator independently:

def test_function(generator):
    results = []
    for i in range(3):
        results.append(next(generator))
    return results

For a function such as this, it makes sense to not evaluate the entire generator into a list before applying the function.

I want to run it in parallel:

import multiprocessing as mp

output = mp.Queue()

processes = [mp.Process(target=test_function, args=(generator, )) for generator in test_list]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

However, I get an error that the generator cannot be pickled.

What is a way that I can run this process in parallel?

Thanks, Jack

Jack Arnestad
  • 1,845
  • 13
  • 26

1 Answers1

1

Instead of generators, which cannot be pickled (see this answer if you want to know why), use iterators, which can be pickled, and are simply objects with a __next__() method, so you can call next() on them. For example:

class first_n_squares:
    def __init__(self, n):
        self.i = 0
        self.n = n

    def __next__(self):
        if self.i < self.n:
            ret = self.i ** 2
            self.i += 1
            return ret
        else:
            raise StopIteration

An instance of first_n_squares is an iterator, so it can be pickled, and you can call next() on it. For example:

first_5_squares_iter = first_n_squares(5)
first_square = next(first_5_squares_iter)