1

In Python (3.5), I started running external executables (written in C++) by multiprocessing.Pool.map + subprocess from an Xshell connection. However, the Xshell connection is interrupted due to bad internet condition.

After connecting again, I see that the managing Python is gone but the C++ executables are still running (and it looks like in the correct way, the Pool seems still control them.)

The question is if this is a bug, and what I shall do in this case. I cannot kill or kill -9 them.

Add: after removing all sublst_file by hand, all running executables(cmd) are gone. It seems that except sub.SubprocessError as e: part is still working.

The basic frame of my program is outlined in the following.

import subprocess as sub
import multiprocessing as mp
import itertools as it
import os
import time

def chunks(lst, chunksize=5):
    return it.zip_longest(*[iter(lst)]*chunksize)

class Work():
    def __init__(self, lst):
        self.lst = lst

    def _work(self, sublst):
       retry_times = 6
       for i in range(retry_times):
             try:
                 cmd = 'my external c++ cmd'
                 sublst_file = 'a config file generated from sublst'
                 sub.check_call([cmd, sublst_file])
                 os.remove(sublst_file)
                 return sublst # return success sublst
             except sub.SubprocessError as e:
                 if i == (retry_times-1):
                    print('\n[ERROR] %s %s failed after %d tries\n' % (cmd, sublst_file, retry_times))
                    return []
                 else:
                     print('\n[WARNNING] %dth sleeping, please waiting for restart\n' % (i+1))
                     time.sleep(1+i)

    def work(self):
        with mp.Pool(4) as pool:
            results = pool.map(self._work, chunks(self.lst, 5))
        for r in it.chain(results):
            # other work on success items
            print(r)
Ma Ming
  • 1,765
  • 2
  • 12
  • 15
  • It isn't too vague but I needed to read it a few times to fully grasp what you are actually doing, might be helpful to add a few lines of code that demonstrates how you are starting the connections in python. – Tadhg McDonald-Jensen Apr 10 '16 at 16:18
  • @TadhgMcDonald-Jensen I have added my code. I guess the problem is caused by the retry part. – Ma Ming Apr 10 '16 at 17:19
  • 1
    It's not a bug, it's your responsibility to clean up if something goes wrong. Why exactly can't you kill the child processes? – Phillip Apr 10 '16 at 17:22
  • @Phillip They just didn't die. – Ma Ming Apr 10 '16 at 17:39
  • 1
    If `kill -9` does not work, the most likely explanation is that you tried killing the wrong process or as the wrong user. For others, see [e.g. here](http://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work). `kill()`ing the children in a SIGTERM handler is the right thing to do and should work. – Phillip Apr 10 '16 at 18:26

1 Answers1

2

The multiprocessing.Pool does terminate its workers upon terminate() which is also called by __del__ which in turn will be called upon module unload (at exit).

The reason why these guys are orphaned is because subprocess.check_call spawns are not terminated upon exit.

This fact is not mentioned explicitly in the reference, but there is no indication anywhere that the spawns are terminated. A brief review of the source code also left me with no findings. This behavior is also easily testable.

To clean up upon parent termination use the Popen interface and this answer Killing child process when parent crashes in python

Community
  • 1
  • 1
szym
  • 5,606
  • 28
  • 34
  • @JonathanLeffler The part about `multiprocess` is well documented, so I assume you mean the second part about `subprocess`. It's not said explicitly, but there is no indication anywhere in the reference doc that the spawns are terminated. A brief review of the source code also left me with no findings. – szym Apr 10 '16 at 17:45