Mixing multiprocessing Pool (producers) with Process (consumer)

Asked Nov 30 '19 at 22:35

Active Nov 30 '19 at 23:05

Viewed 40 times

I am processing a large csv file in chunks and I write the output to another file. For the processing bit I have the following code:

def process_data(self):
    pool = multiprocessing.Pool(multiprocessing.cpu_count())

    for result in pool.imap(self.process_data_chunk, self.data_chunks):
        pass

At the moment, process_data_chunk also writes to the output file for which it needs a lock. What I'd like to do is for process_data_chunk to put the chunk in a multiprocessing.Queue and I'd like a separate process to consume from this queue. I'm not sure if I can do this? Can I combine Pool with a single Process where the Pool is the producer and the Process is the consumer?

edited Nov 30 '19 at 23:05

asked Nov 30 '19 at 22:35

s5s

11,159
21
74
121

You should use the sharing between processes feature as mentioned here : https://docs.python.org/dev/library/multiprocessing.html#sharing-state-between-processes – furkanayd Nov 30 '19 at 22:56
You can use a `multiprocessing.Queue`. Use `put` to place the results of the processing on the queue, and then another `multiprocessing.Process` can `get` data from the queue. – John Anderson Nov 30 '19 at 23:09
@JohnAnderson How? Where do I create the pool? How do I start it and join it? I've thought about a different design but this would be interesting to get to the bottom of. – s5s Nov 30 '19 at 23:16
You want to start a separate process for writing to the file so you can avoid the lock? – wwii Dec 01 '19 at 00:49

Mixing multiprocessing Pool (producers) with Process (consumer)

0 Answers0