4

Earlier I tried to use the threading module in python to create multiple threads. Then I learned about the GIL and how it does not allow taking advantage of multiple CPU cores on a single machine. So now I'm trying to do multiprocessing (I don't strictly need seperate threads).

Here is a sample code I wrote to see if distinct processes are being created. But as can be seen in the output below, I'm getting the same process ID everytime. So multiple processes are not being created. What am I missing?

import multiprocessing as mp
import os

def pri():
    print(os.getpid())

if __name__=='__main__':

    # Checking number of CPU cores
    print(mp.cpu_count())

    processes=[mp.Process(target=pri()) for x in range(1,4)]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

Output:

4
12554
12554
12554
RodrikTheReader
  • 757
  • 1
  • 9
  • 22

2 Answers2

6

The Process class requires a callable as its target.

Instead of running the function in the separate process, you are calling it and passing its result (None in this case) to the Process class.

Just change the following:

mp.Process(target=pri())

with:

mp.Process(target=pri)
noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • This still returns the same pid. It won't even print it to the screen since it runs in a subprocess. – Chen A. Oct 15 '17 at 13:59
  • I just tried the code, I get 3 different PIDs. Depending on the OS, it might re-use the PID which, in this case, is not relevant to the OP as the scope is to test the `multiprocessing` library. – noxdafox Oct 15 '17 at 14:02
0

Since the subprocesses runs on a different process, you won't see their print statements. They also don't share the same memory space. You pass pri() to target, where it needs to be pri. You need to pass a callable object, not execute it.

The prints you see are part of your main thread executions. Because you pass pri(), the code is actually executed. You need to change your code so the pri function returns value, rather than prints it.

Then you need to implement a queue, where all your threads write to it and when they're done, your main thread reads the queue.

A nice feature of the multiprocessing module is the Pool object. It allows you to create a thread pool, and then just use it. It's more convenient.

I have tried your code, the thing is the command executes too quick, so the OS reuses the PIDs. If you add a time.sleep(1) in your pri function, it would work as you expect.

That is True only for Windows. The example below is made on Windows platform. On Unix like machines, you won't need the sleep.

The more convenience solution is like this:

from multiprocessing import Pool
from time import sleep
import os

def pri(x):
    sleep(1)
    return os.getpid()

def use_procs():
    p_pool = Pool(4)
    p_results = p_pool.map(pri, [_ for _ in range(1,4)])
    p_pool.close()
    p_pool.join()
    return p_results

if __name__ == '__main__':
    res = use_procs()
    for r in res:
        print r

Without the sleep:

==================== RESTART: C:/Python27/tests/test2.py ====================
6576
6576
6576
>>> 

with the sleep:

==================== RESTART: C:/Python27/tests/test2.py ====================
10396
10944
9000
Chen A.
  • 10,140
  • 3
  • 42
  • 61
  • If you use a `Pool`, the tasks will be processed by the first available worker. As Windows uses the `spawn` starting method, the processes are started way slower than in Unix (which by default uses `fork`). Therefore, the first worker process "steals" all the tasks before the other ones are ready. That's why you see only the first PID being printed in your logic. – noxdafox Oct 15 '17 at 14:06
  • You're right. I just tried it on a Linux machine, and it does print unique PIDs. However, besides printing, in order for the processes to communicate with the main thread, it has to use a communication interface (queue, sockets, etc). Using `Pool` takes care for that, which simplifies multiprocessing. – Chen A. Oct 15 '17 at 14:13
  • It depends on the use case, sometimes you just have to spin something like, for example, an HTTP server. In these cases you don't care much about what the function returns. Processes are good for many use cases, my personal favourite is the isolation they provide to your application. If the logic in another process crashes, your application will remain unaffected. This cannot be achieved as easily with threads or coroutines. – noxdafox Oct 15 '17 at 14:31
  • @noxdafox I wonder how the print statement works with multiprocesses. How does it printed in the main console? I thought different processes can't communicate like that, it has to go through some means such queues, pipes or ipc, etc. *How come a print statement from a subprocess printed in the main thread*? – Chen A. Oct 15 '17 at 18:53
  • I'm not familiar with Windows internals but on Unix the `fork` duplicates the process address space as well as for the file descriptors. Therefore, a newly forked process will have the same stdout file descriptor as for the parent. Hence you see the output directed towards the same console. – noxdafox Oct 15 '17 at 20:13
  • Gave a quick glance at the code. The console is passed to the newly spawned processes via a dedicated pipe. That's the reason why ["more picklability"](https://docs.python.org/3.6/library/multiprocessing.html#the-spawn-and-forkserver-start-methods) is necessary when dealing with the `spawn` starting strategy. – noxdafox Oct 15 '17 at 20:23
  • Thanks for the explanation, I'll check it out – Chen A. Oct 15 '17 at 20:54