0

I am processing a large csv file in chunks and I write the output to another file. For the processing bit I have the following code:

def process_data(self):
    pool = multiprocessing.Pool(multiprocessing.cpu_count())

    for result in pool.imap(self.process_data_chunk, self.data_chunks):
        pass

At the moment, process_data_chunk also writes to the output file for which it needs a lock. What I'd like to do is for process_data_chunk to put the chunk in a multiprocessing.Queue and I'd like a separate process to consume from this queue. I'm not sure if I can do this? Can I combine Pool with a single Process where the Pool is the producer and the Process is the consumer?

s5s
  • 11,159
  • 21
  • 74
  • 121
  • You should use the sharing between processes feature as mentioned here : https://docs.python.org/dev/library/multiprocessing.html#sharing-state-between-processes – furkanayd Nov 30 '19 at 22:56
  • You can use a `multiprocessing.Queue`. Use `put` to place the results of the processing on the queue, and then another `multiprocessing.Process` can `get` data from the queue. – John Anderson Nov 30 '19 at 23:09
  • @JohnAnderson How? Where do I create the pool? How do I start it and join it? I've thought about a different design but this would be interesting to get to the bottom of. – s5s Nov 30 '19 at 23:16
  • You want to start a separate process for writing to the file so you can avoid the lock? – wwii Dec 01 '19 at 00:49

0 Answers0