I am processing a large csv file in chunks and I write the output to another file. For the processing bit I have the following code:
def process_data(self):
pool = multiprocessing.Pool(multiprocessing.cpu_count())
for result in pool.imap(self.process_data_chunk, self.data_chunks):
pass
At the moment, process_data_chunk also writes to the output file for which it needs a lock. What I'd like to do is for process_data_chunk
to put the chunk in a multiprocessing.Queue
and I'd like a separate process to consume from this queue. I'm not sure if I can do this? Can I combine Pool
with a single Process
where the Pool
is the producer and the Process
is the consumer?