2

I am trying to multiprocess system commands, but can't get it to work with a simple program. The function runit(cmd) works fine though...

#!/usr/bin/python3
from subprocess import call, run, PIPE,Popen
from multiprocessing import Pool
import os
pool = Pool()

def runit(cmd):
    proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
    return proc.stdout.read()

#print(runit('ls -l'))

it = []
for i in range(1,3):
    it.append('ls -l')

results = pool.map(runit, it)

It outputs:

Process ForkPoolWorker-1:
Process ForkPoolWorker-2:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
    return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'runit' on <module '__main__' from './syscall.py'>
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 345, in get
    return ForkingPickler.loads(res)
AttributeError: Can't get attribute 'runit' on <module '__main__' from './syscall.py'>

Then it somehow waits and does nothing, and when I press Ctrl+C a few times it spits out:

^CProcess ForkPoolWorker-4:
Process ForkPoolWorker-6:
Traceback (most recent call last):
  File "./syscall.py", line 17, in <module>
Process ForkPoolWorker-5:
    results = pool.map(runit, it)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
...
    buf = self._recv(4)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Niels
  • 537
  • 5
  • 22
  • 2
    note that you'll be well better off with multithreading instead of multiprocessing. `subprocess.Popen` already invokes multiprocessing. multithreading is much easier to handle – Jean-François Fabre Aug 24 '17 at 13:32

1 Answers1

3

I'm not sure, since the issue I know is windows-related (and I don't have access to Linux box to reprocude), but in order to be portable you have to wrap your multiprocessing-dependent commands in if __name__=="__main__" or it conflicts with the way python spawns the processes: that fixed example runs fine on windows (and should work OK on other platforms as well):

from multiprocessing import Pool
import os

def runit(cmd):
    proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
    return proc.stdout.read()

#print(runit('ls -l'))

it = []
for i in range(1,3):
    it.append('ls -l')

if __name__=="__main__":
    # all calls to multiprocessing module are "protected" by this directive
    pool = Pool()

(Studying the error messages more closely, now I'm pretty sure that just moving pool = Pool() after the declaration of runit would fix it as well on Linux, but wrapping in __main__ fixes+makes it portable)

That said, note that your multiprocessing just creates a new process, so you'd be better off with thread pools (Threading pool similar to the multiprocessing Pool?): threads which creates processes, like this:

from multiprocessing.pool import ThreadPool  # uses threads, not processes
import os

def runit(cmd):
    proc = Popen(cmd, shell=True,stdout=PIPE, stderr=PIPE, universal_newlines=True)
    return proc.stdout.read()

it = []
for i in range(1,3):
    it.append('ls -l')

if __name__=="__main__":
    pool = ThreadPool()   # ThreadPool instead of Pool
    results = pool.map(runit, it)
    print(results)
        results = pool.map(runit, it)
        print(results)

the latter solution is more lightweight and is less issue-prone (multiprocessing is a delicate module to handle). You'll be able to work with objects, shared data, etc... without the need for a Manager object, among other advantages

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • I'm running it under Linux Mint 18.1 KDE and your solutions help like a charm! – Niels Aug 24 '17 at 13:54
  • I'm sure the thread solution works, can you confirm that the modified multiprocessing solution works as well? – Jean-François Fabre Aug 24 '17 at 14:00
  • Yes it does. However, running 100 tasks takes the same amount of tasks on 4 cpu implemented with both ThreadPool and Thread, but I understand the difference and would go for ThreadPool – Niels Aug 24 '17 at 14:45
  • creating a thread is faster than creating a process. Keep it threaded. If you have to perform pure python loops & stuff, you have to switch to multiprocessing though, as the GIL makes sure that only one thread is running (which is needed for memory integrity, design chosen to avoid having to use mutexes to protect lists, dicts...) – Jean-François Fabre Aug 24 '17 at 14:50
  • I have the same problem but it hangs with no error. I have a feeling its memory related since in my case the process max mem usage goes up 98% before it hangs. – radtek May 21 '20 at 14:55
  • in that case, just create a case which uses less memory, or a mockup. – Jean-François Fabre May 21 '20 at 15:04