2

I observe a really weird behaviour when using pool.map to call a method function. With only one process the behaviour is not the same as a simple for loop and we enter several times in the if not self.seeded: block whereas we should not. Here is the codes and outputs below :

import os
from multiprocessing import Pool


class MyClass(object):
    def __init__(self):
        self.seeded = False
        print("Constructor of MyClass called")

    def f(self, i):
        print("f called with", i)
        if not self.seeded:
            print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
            self.seeded = True

    def multi_call_pool_map(self):
        with Pool(processes=1) as pool:
            print("multi_call_pool_map with {} processes...".format(pool._processes))
            pool.map(self.f, range(10))

    def multi_call_for_loop(self):
        print("multi_call_for_loop ...")
        list_res = []
        for i in range(10):
            list_res.append(self.f(i))


if __name__ == "__main__":
    MyClass().multi_call_pool_map()

outputs :

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 4
f called with 5
f called with 6
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 7
f called with 8
f called with 9
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False

And with the for loop :

if __name__ == "__main__":
    MyClass().multi_call_for_loop()

outputs :

Constructor of MyClass called
multi_call_for_loop ...
f called with 0
PID : 15840, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
f called with 4
f called with 5
f called with 6
f called with 7
f called with 8
f called with 9

How can we explain the behaviour with pool.map (first case) ? I don't understand why we enter multiple times inside the if block because self.seeded is set to False only in the constructor and the constructor is called only once... (I have Python 3.6.8)

Ismael EL ATIFI
  • 1,939
  • 20
  • 16
  • 2
    It's because of the way Pool is chunking your input iterable. The chunksize for your setup will be 3, resulting in [3,3,3,1] chunks here. You can calculate it with `calc_chunksize()` in my answer [here](https://stackoverflow.com/q/53751050/9059420). – Darkonaut Jun 13 '19 at 13:46

2 Answers2

3

when running the code and also printing self inside f, we can see that before each time we enter the if clause, the instance actually changes:

    def f(self, i):
        print("f called with", i, "self is",self)
        if not self.seeded:
            print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
            self.seeded = True

this outputs:

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7f30cd592b38>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 2 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 3 self is <__main__.MyClass object at 0x7f30cd592b00>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 4 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 5 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 6 self is <__main__.MyClass object at 0x7f30cd592ac8>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 7 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 8 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 9 self is <__main__.MyClass object at 0x7f30cd592a90>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False

if you add chunksize=10 to .map() it will behave just like the for loop:

    def multi_call_pool_map(self):
        with Pool(processes=1) as pool:
            print("multi_call_pool_map with {} processes...".format(pool._processes))
            pool.map(self.f, range(10), chunksize=10)

this outputs:

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7fd175093b00>
PID : 22972, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 2 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 3 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 4 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 5 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 6 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 7 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 8 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 9 self is <__main__.MyClass object at 0x7fd175093b00>

exactly why this happens is a very elaborate implementation detail and has to do with how multiprocessing shares data between processes in the same pool.

I'm afraid I'm not qualified enough to answer exactly how and why this works internally.

Adam.Er8
  • 12,675
  • 3
  • 26
  • 38
  • Thanks for the quick response. The fact that the constructor is called only once despite having several instances is a bit confusing. And why id(self.seeded) is the same for all the different instances ? – Ismael EL ATIFI Jun 13 '19 at 13:39
  • I'm trying to find a simple article about this online, but all I find is regarding the usage and not internal details I'm afraid – Adam.Er8 Jun 13 '19 at 13:44
  • It depends what you want to do, but using ```from multiprocessing.dummy import Pool``` (which is NOT the same! Google multiprocessing vs multithreading) you immediately get the desired behavior. – anki Jun 13 '19 at 13:57
1

When you use an instance method with Pool.map, a copy of the object instance is sent to the worker process with help of the pickle module. Your results show how map works in chunks, and that the object instance is reloaded from the pickled form at start of each chunk. Loading a pickle does not call __init__.

See https://thelaziestprogrammer.com/python/a-multiprocessing-pool-pickle for more explanation what goes on under the hood.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94