2

I have a generator that downloads files from FTP. I want to run these in parallel, without exhausting the generator together. I want the generator to lazily be evaluated. Is this possible in Python 2?

    def parse_and_load(filename):
        raw_records, exchange_data = parser.parse(filename)
        loader.load(raw_records, exchange_data)

    with closing(mp.Pool(4)) as pool:
        pool.map(parse_and_load, downloader.get_files(self.date))

Instead of downloading 4 files and processing them in parallel, the above code is evaluating the whole generator first and then calling parse and load in parallel.

ayushgp
  • 4,891
  • 8
  • 40
  • 75

1 Answers1

2

[EDIT] Sorry, I've spoken too soon.

The same problem is discussed in Python Multiprocessing.Pool lazy iteration. It points out that is by design of multiprocessing.map. Getting a lazier behaviour will require changing tools. Some suggestions:

So if I understood well, the below solution is not relevant for the OP question.


You're probably getting bitten by the chunksize argument (see reference). What happens under the hood is that multiprocessing "groups" arguments into chunks to limit inter-process overheads.

You should try again with chunksize=1