1

I have a JavaScript application running on a Python / PyQt / QtWebKit foundation which creates subprocess.Popen objects to run external processes.

Popen objects are kept in a dictionary and referenced by an internal identifier so that the JS app can call Popen's methods via a pyqtSlot such as poll() to determine whether the process is still running or kill() to kill a rogue process.

If a process is not running any more, I would like to remove its Popen object from the dictionary for garbage collection.

What would be the recommended approach to cleaning up the dictionary automatically to prevent a memory leak ?

My ideas so far:

  • Call Popen.wait() in a thread per spawned process to perform an automatic cleanup right upon termination.
    PRO: Immediate cleanup, threads probably do not cost much CPU power as they should be sleeping, right ?
    CON: Many threads depending on spawning activity.
  • Use a thread to call Popen.poll() on all existing processes and check returncode if they have terminated and clean up in that case.
    PRO: Just one worker thread for all processes, lower memory usage.
    CON: Periodic polling necessary, higher CPU usage if there are many long-running processes or lots of processed spawned.

Which one would you choose and why ? Or any better solutions ?

Arc
  • 11,143
  • 4
  • 52
  • 75
  • What operating system[s] will this need to work on? – Aya May 02 '13 at 14:58
  • Mainly Windows, Mac OS X if possible, Linux would be nice to have. Best would be a platform-agnostic solution. – Arc May 02 '13 at 15:53
  • Well, the answer I gave will work on Linux and OSX. I'll have to put some thought into a Windows solution. – Aya May 02 '13 at 15:55
  • Alright, thanks so far, and I forgot to mention that this is Python 3.3. – Arc May 02 '13 at 16:00
  • Updated answer. The code samples are for Python 2.x, but you need only change `print foo` to `print(foo)` for 3.x-compat. – Aya May 02 '13 at 16:37

1 Answers1

3

For a platform-agnostic solution, I'd go with option #2, since the "CON" of high CPU usage can be circumvented with something like...

import time

# Assuming the Popen objects are in the dictionary values
PROCESS_DICT = { ... }

def my_thread_main():
    while 1:
        dead_keys = []
        for k, v in PROCESS_DICT.iteritems():
            v.poll()
            if v.returncode is not None:
                dead_keys.append(k)
        if not dead_keys:
            time.sleep(1)  # Adjust sleep time to taste
            continue
        for k in dead_keys:
            del PROCESS_DICT[k]

...whereby, if no processes died on an iteration, you just sleep for a bit.

So, in effect, your thread would still be sleeping most of the time, and although there's potential latency between a child process dying and its subsequent 'cleanup', it's really not a big deal, and this should scale better than using one thread per process.

There are better platform-dependent solutions, however.

For Windows, you should be able to use the WaitForMultipleObjects function via ctypes as ctypes.windll.kernel32.WaitForMultipleObjects, although you'd have to look into the feasibility.

For OSX and Linux, it's probably easiest to handle the SIGCHLD asynchronously, using the signal module.

A quick n' dirty example...

import os
import time
import signal
import subprocess

# Map child PID to Popen object
SUBPROCESSES = {}

# Define handler
def handle_sigchld(signum, frame):
    pid = os.wait()[0]
    print 'Subprocess PID=%d ended' % pid
    del SUBPROCESSES[pid]

# Handle SIGCHLD
signal.signal(signal.SIGCHLD, handle_sigchld)

# Spawn a couple of subprocesses
p1 = subprocess.Popen(['sleep', '1'])
SUBPROCESSES[p1.pid] = p1
p2 = subprocess.Popen(['sleep', '2'])
SUBPROCESSES[p2.pid] = p2

# Wait for all subprocesses to die
while SUBPROCESSES:
    print 'tick'
    time.sleep(1)

# Done
print 'All subprocesses died'
Aya
  • 39,884
  • 6
  • 55
  • 55
  • Yeah, I've been thinking about using `WaitForMultipleObjects()`, however the solution would be a bit complicated I guess... you'd probably have to renew the waiting process every time a new process is added, which is maybe not worth the effort, e.g. in a loop and using a wait timeout of some seconds or something. Additionally, you might need to split the waiting into multiple threads due to the `MAXIMUM_WAIT_OBJECTS` limit. – Arc May 03 '13 at 08:27
  • @Archimedix Yeah. It's very similar to using [`select()`](http://docs.python.org/2/library/select.html#select.select) on multiple file descriptors - the usual idiom being to include the FD (usually a listening socket) which could potentially change the set of FDs you're monitoring. So in your case, you'd need to include some object in the set which could be used to detect when a new process is created, then the wait timeout could be very long. The only gain, though, would be to remove the latency from your option #2. (continued in next comment) – Aya May 03 '13 at 13:07
  • @Archimedix (continued from prev comment) The `SIGCHLD` solution seems to be the most elegant, being asynchronous (i.e. requiring no blocking calls), and could be used in the main thread. It could also be used on Windows, as long as your code will run under a version of Python compiled for [cygwin](http://www.cygwin.com/), but that's probably going to be even more complicated if you're using several third-party Python extension modules. I'd suggest using option #2 for now, since it doesn't require much thread management, and look at optimizing later if it becomes necessary. – Aya May 03 '13 at 13:20
  • @Archimedix I just came across [another question](http://stackoverflow.com/questions/3556048/how-to-detect-win32-process-creation-termination-in-c) which might provide an asynchronous Windows option using WMI - not sure how easy it would be to adapt to Python, though. – Aya May 03 '13 at 13:33
  • Thanks, I think WMI is far too much to bother with. – Arc May 13 '13 at 11:57