0

I want to run a process in parallel using python3. The code I have is running one thing after the other one. Any ideas on how to make it parallel?

from multiprocessing import Process

def work(x, outfile):
    for i in range(0,200000):
        print(x, i,'hello world', outfile)

if __name__ == '__main__':
    NUM_THREADS = 4
    for x in range(NUM_THREADS):
        try:
            outfile = "tmp"+str(x) 
            p = Process(target=work, args =(x, outfile)) 
            p.start()
            p.join()
        except:
            raise
            print("Error: unable to start thread", x)
aLbAc
  • 337
  • 3
  • 18
  • 1
    `p.join()` waits for the process to finish. You want to put that out of the loop. – syntonym Oct 12 '17 at 12:53
  • Putting `raise` before the `print` means you'll never `print` (the re-`raise` bypasses subsequent code until caught somewhere else or it bubbles out of `__main__` completely). You might want to swap those if the `print` is important to you. – ShadowRanger Oct 12 '17 at 17:23

4 Answers4

0

You can't start and join in the same block of the same loop. The join means the current running thread must stop until the "process" started completes

if __name__ == '__main__':
    NUM_THREADS = 4
    process_list = []
    for x in range(NUM_THREADS):
        try:
            outfile = "tmp"+str(x) 
            p = Process(target=work, args =(x, outfile)) 
            p.start()
            process_list.append(p)
        except:
            raise
            print("Error: unable to start thread", x)

     # wait for processes to finish
     for process in process_list:
         process.join()
GDub
  • 544
  • 2
  • 7
  • 15
0

I'm not sure if this is relevant for you, but I generally struggled with the multiprocess module, and instead had greater success with the pathos module (in Linux and Mac at least, not Windows). I had set this up for multi-core use, but check the pathos module for threading/core split usage.

Credit to Mike McKerns for writing up this package, it made my life a lot easier for multicore use in python

Minimal code required, see below:

from pathos.helpers import mp
import numpy as np

x=np.arange(0,200000)
splitx=np.array_split(x,4)
def dummy(y):
    return(y)

pooler=mp.Pool(4)

for value in pooler.imap(dummy,splitx):
    print(value)

pooler.close()
pooler.join()

[    0     1     2 ..., 49997 49998 49999]
[50000 50001 50002 ..., 99997 99998 99999]
[100000 100001 100002 ..., 149997 149998 149999]
[150000 150001 150002 ..., 199997 199998 199999]
srhoades10
  • 103
  • 1
  • 9
  • There is nothing in this "helper" module that doesn't fit the existing `multiprocessing.Pool` API; what is the point? – ShadowRanger Oct 12 '17 at 17:22
  • Hm, I'm honestly not an expert here, but by including the dill module it overcomes some issues of pickling object types in the multiprocess module (and less for noobies like me to worry about) (https://stackoverflow.com/questions/19984152/what-can-multiprocessing-and-dill-do-together) – srhoades10 Oct 12 '17 at 18:00
0

first of all sincs multiprocessing.Process will run its target function in a new python inperteter the builtin print dosn't print to the console to remedy this just import

import jpe_types.paralel

it will override the print satement
however you will have to use jpe_types.paralel.Process instead of multiprocessing.Process to alsow overide print in Process interpreters

in adition to that you have to start all processes and than join them later so save them to a list like this

import jpe_types.paralel

def work(x, outfile):
    for i in range(0,5):
        print(x, i,'hello world', outfile)

if __name__ == '__main__':
    NUM_PROCESSES = 4
    processes = []
    for x in range(NUM_PROCESSES):

        outfile = "tmp"+str(x) 
        p = jpe_types.paralel.Process(target=work, args =(x, outfile)) 
        p.start()
        processes.append(p)
        
    for p in processes:
        p.join()

this than outputs

1 0 hello world tmp1
2 0 hello world tmp2
0 0 hello world tmp0
3 0 hello world tmp3
1 1 hello world tmp1
2 1 hello world tmp2
0 1 hello world tmp0
3 1 hello world tmp3
1 2 hello world tmp1
2 2 hello world tmp2
0 2 hello world tmp0
3 2 hello world tmp3
1 3 hello world tmp1
2 3 hello world tmp2
0 3 hello world tmp0
3 3 hello world tmp3
1 4 hello world tmp1
2 4 hello world tmp2
0 4 hello world tmp0
3 4 hello world tmp3
```
Dharman
  • 30,962
  • 25
  • 85
  • 135
Julian wandhoven
  • 268
  • 2
  • 11
-3

You need to run the process as a daemon.

Try adding p.daemon = True before p.start()

p.join() waits for the process to finish. you need to get rid of that as well

Akshay Apte
  • 1,539
  • 9
  • 24
  • 1
    This is false, via the [python documentation](https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.daemon): `[Daemon processes] are not Unix daemons or services, they are normal processes that will be terminated (and not joined) if [their parent process has] exited.` – syntonym Oct 12 '17 at 12:55
  • I tried to add `p.daemon = True`, it doesn't seem to make a difference. Getting rid of `p.join()` makes things even worse, the function runs partially only 1 time and then it breaks and stops. – aLbAc Oct 12 '17 at 16:30