I've been playing around with a Pool
object while using an instance method as the func
argument. It's been a bit surprising with regards to instance state. It seems like the instance gets reset on every chunk. E.g.:
import multiprocessing as mp
import logging
class Worker(object):
def __init__(self):
self.consumed = set()
def consume(self, i):
if i not in self.consumed:
logging.info(i)
self.consumed.add(i)
if __name__ == '__main__':
n = 1
logging.basicConfig(level='INFO', format='%(process)d: %(message)s')
worker = Worker()
with mp.Pool(processes=2) as pool:
pool.map(worker.consume, [1] * 100, chunksize=n)
If n
is set to 1, then 1
gets logged every time. if n
is set to 20, it's logged 5 times, etc. What is the reason for this, and is there any way around it? I also wanted to use the initializer
pool argument with an instance method but hit similar issues.