4

I wrote the following code

from multiprocessing import Pool 

class A:
    def __init__(self) -> None:
        self.a = 1
    def incr(self,b):
        self.a += 1
        print(self.a)
    def execute(self):
        with Pool(2) as p:
            p.starmap(self.incr, [[1],[1],[1],[1]])

a = A()
a.execute()
print(a.a)

The output is 2 2 2 2 1. I want to understand what exactly happens in this scenario. Does the pool create four copies of self? If so how is this copying done?

bumpbump
  • 542
  • 4
  • 17
  • 1
    `multiprocessing` creates more than just a copy of an object, it effectively copies everything in your program, but how it does that exactly depends on the way the process is started and the operating system you're running on. See [this question](https://stackoverflow.com/questions/28073679/why-do-processes-spawned-by-the-multiprocessing-module-not-duplicate-memory) for example – Grismar Jan 04 '22 at 03:35
  • It creates *multiple python processes*. Imagine you ran `python myscript.py` multiple times (the exact details can vary based on whether the processes use "spawn" or "fork") – juanpa.arrivillaga Jan 04 '22 at 05:05
  • with Pool(2) as p ?? why 2 what happen if you use 4 ? – pippo1980 Jan 04 '22 at 09:49
  • here: https://medium.datadriveninvestor.com/python-multiprocessing-pool-vs-process-comparative-analysis-6c03c5b54eec – pippo1980 Jan 04 '22 at 11:10

2 Answers2

3

In this scenario, the file is run linearly until a.execute() is called and reaches the call to starmap. What multiprocessing does is to create two more processes. The way this is done depends is limited by the OS and can (in some cases be selected using multiprocessing.set_start_method. The main differences between the methods is that with 'fork' methods the new process are either created copying the current process (it is more complicated, but that's the idea), while in the 'spawn' methods a new python interpreter is started, the file is re-executed, and the call is performed. In this later case, as the file is re-executed, it is very important to use the if __name__ == "__main__": guard.

As each process has its own copy of a, what happens in one process does not affect the other copies of a: even if all copies of have a self.a value of 2, the original a is unchanged. You can test the different methods for starting process with this piece of code (delays added for clarity):

from multiprocessing import Pool, set_start_method, get_start_method
import os
from time import sleep

globalflag = "Flag"

class A:
    def __init__(self) -> None:
        self.a = 1
    def incr(self, b):

        sleep(b/10)
        print("Process id:", os.getpid(), " incrementing")
        print("Flag = ", globalflag)
        self.a += 1
        print(self.a)
    def execute(self):
        with Pool() as p:
            p.starmap(self.incr, [[1],[2],[3],[4]])

print("Running module:", __name__)

if __name__ == "__main__":
    method = 'fork'#'spawn' # 'fork' 'forkserver'
    if get_start_method(allow_none=True) is None:
        print("Setting start method to ", method)
        set_start_method(method)
    else:
        print('Start method already set: ', get_start_method())
    a = A()
    globalflag = "Flog"
    a.execute()
    print(a.a)

output with method='fork':

Running module: __main__
Start method already set:  fork
Process id: 6687  incrementing
Flag =  Flog
2
Process id: 6688  incrementing
Flag =  Flog
2
Process id: 6689  incrementing
Flag =  Flog
2
Process id: 6690  incrementing
Flag =  Flog
2
1

output with 'spawn':

Running module: __main__
Setting start method to  spawn
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Running module: __mp_main__
Process id: 7096  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7097  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7098  incrementing
Flag =  Flag
2
Running module: __mp_main__
Process id: 7100  incrementing
Flag =  Flag
2
1
azelcer
  • 1,383
  • 1
  • 3
  • 7
-2

try running this changing with Pool(2) as p with:

with Pool(1) as p , with Pool(2) as p , with Pool(4) as p , with Pool(8) as p

and have a look at the different outputs


from multiprocessing import Pool 

import os

print('PID ;',os.getpid()) ## get the current process id or PID

class A:
    def __init__(self) -> None:
        self.a = 1
    def incr(self,b):
        self.a += 1
        print(self.a)
        print('PID ;',os.getpid())
    def execute(self):
        with Pool(2) as p:
            p.starmap(self.incr, [[1],[1],[1],[1]])

a = A()
a.execute()
print(a.a)
print('PID ;',os.getpid())

remember:

os.getpid() :

Return the current process id.
pippo1980
  • 2,181
  • 3
  • 14
  • 30