0

My class for handling processes, somewhat shortened for simplicity, looks like this:

class LocalEngine:

    def __init__(self):
        name_to_pid = {}

    def run_instance(self, name):
        args = ... # Computing args for Popen from arguments
        with open(f"out-{name}", 'a') as out, open(f"err-{name}", 'a') as err:
            pid = \
                subprocess.Popen(args, stdout=out, stderr=err, shell=False).pid
            print(f"Created process {pid}")
            self.name_to_pid[name] = pid

    def kill_instance(self, name):
        pid = self.name_to_pid[name]
        print(f"Killing process {pid}")
        psutil.Process(pid).kill()

There is only one object engine of type LocalEngine and all processes are created using engine.run_instance. At the end, all processes are killed using engine.kill_instance.

Here is the relevant part of the main script's output (under Ubuntu):

Created process 9676
Created process 9703
Killing process 9703
Killing process 9676

However, this is what ps auxfww outputs after all processes are created:

root     27576  0.0  0.0    896    80 ?        S    10:33   0:00  \_ /init
meir     27577  0.0  0.0  10688  5944 pts/2    Ss   10:33   0:00      \_ -bash
meir      9667  2.2  0.0  24296 16956 pts/2    S+   15:26   0:00          \_ python -m examples.agent_assignment.run_server
meir      9668  0.3  0.0 392712 16732 pts/2    Sl+  15:26   0:00              \_ python -m examples.agent_assignment.run_server
meir      9676  1.9  0.0  24016 16340 pts/2    S+   15:26   0:00              \_ python -m src.run_backup 127.0.1.1 57785 server-1
meir      9677  3.7  0.0 392784 14648 pts/2    Sl+  15:26   0:00              |   \_ python -m src.run_backup 127.0.1.1 57785 server-1
meir      9703  8.0  0.0  28364 21084 pts/2    S+   15:26   0:00              \_ python -m examples.agent_assignment.run_client 127.0.1.1 57785 client-1 2
meir      9704 10.0  0.0 765568 18912 pts/2    Sl+  15:26   0:00                  \_ python -m examples.agent_assignment.run_client 127.0.1.1 57785 client-1 2

run_server is the main script. The two calls to Popen run the scripts named run_backup and run_client, respectively. The above output of ps shows that, for some reason, there are two processes for each call of Popen. Since I pass shell=False to Popen, I would not expect that to happen. Here is the output of ps after the processes are killed:

root     27575  0.0  0.0    896    80 ?        Ss   10:33   0:00 /init
root     27576  0.0  0.0    896    80 ?        S    10:33   0:00  \_ /init
meir     27577  0.0  0.0  10688  5944 pts/2    Ss+  10:33   0:00      \_ -bash
meir      9677  0.1  0.0 466516 14736 pts/2    Sl   15:26   0:00      \_ python -m src.run_backup 127.0.1.1 57785 server-1
meir      9704  0.0  0.0 749176 21088 pts/2    Sl   15:26   0:00      \_ python -m examples.agent_assignment.run_client 127.0.1.1 57785 client-1 2

What are these extra processes that linger? How do I prevent them from being created?

EDIT AFTER FEEDBACK:

Based on the reply by @AKX I changed the kill_instance method. I cannot switch to storing process objects instead of pid's due to the requirement of unpickling by another process.

    def kill_instance(self, name):
        # Process tree with full commands: ps auxfww
        pid = self.name_to_pid[name]
        print(f"Terminating process {pid}")
        proc = psutil.Process(pid)
        proc.terminate()
        print(f"Waiting for process {proc} to terminate")
        try:
            proc.wait(timeout=10)
        except subprocess.TimeoutExpired:
            print(f"Process {pid} did not terminate in time, killing it")
            proc.kill()

The exception is not triggered, but there are still processes lingering as in the original question...

UPDATE:

Got it! The second process appears because of starting multiprocessing manager. Now the question is how to terminate both the process and the manager...

AlwaysLearning
  • 7,257
  • 4
  • 33
  • 68
  • To repeat the feedback on [your previous question](https://stackoverflow.com/questions/73097986/unable-to-pkill-a-subprocess): why are you not simply storing the actual `Popen` object and then eventually calling its `kill` method? – tripleee Jul 25 '22 at 12:47
  • @tripleee Because I need to pickle `engine` and the proc object is not pickleable. – AlwaysLearning Jul 25 '22 at 12:49
  • @AlwaysLearning You can customize pickling with `__getstate__` if that's your issue. – AKX Jul 25 '22 at 12:51
  • @AKX The process that unpickles needs to be able to kill these processes, so it must work based on pid anyway. Would `terminate` not work with the approach based on storing pid as the dictionary value? – AlwaysLearning Jul 25 '22 at 12:58
  • Sure, `Process(pid).terminate()` would work. You did never originally mention that there's a separate process that needs to unpickle anything, though, which does change things. – AKX Jul 25 '22 at 13:02
  • This sounds like a massive [XY problem](https://en.wikipedia.org/wiki/XY_problem) once again. Guessing a bit as to what your actual requirements are, probably pickle (or otherwise just save) the PID:s, which are simple integers. You'd still have to figure out somehow if a PID could have been reused after a reboot or something. – tripleee Jul 25 '22 at 13:18
  • @AKX You are right, my implementation begged the question, so I should have mentioned it. In any case, using `terminate` does not seem to make a difference (see the update in the question)... – AlwaysLearning Jul 25 '22 at 13:19
  • @tripleee Please see the last update. The real question becomes: how to terminate a process that started a manager... I wonder if I should submit another question to keep things clean. – AlwaysLearning Jul 25 '22 at 13:52
  • @AlwaysLearning You could set up a signal handler in your manager-starting processes to clean things up... cleanly. Or send `SIGINT` instead of `SIGTERM`; that should propagate into a KeyboardInterrupt. – AKX Jul 25 '22 at 18:06

1 Answers1

1

I think using kill() (the SIGKILL signal) is a bit problematic; you should try to terminate() (SIGTERM) the processes first.

Also, it'd be better to do that via the Popen object, so Python's side of the Popen has time to deal with things:

import subprocess


class LocalEngine:
    def __init__(self):
        self.name_to_proc: dict[str, subprocess.Popen] = {}

    def run_instance(self, name):
        args = ...  # Computing args for Popen from arguments
        with open(f"out-{name}", 'a') as out, open(f"err-{name}", 'a') as err:
            proc = subprocess.Popen(args, stdout=out, stderr=err, shell=False)
            print(f"Created process {proc.pid}")
            self.name_to_proc[name] = proc

    def kill_instance(self, name, timeout=10):
        proc = self.name_to_proc[name]
        print(f"Killing process {proc}")
        proc.terminate()
        print(f"Waiting for process {proc} to terminate")
        try:
            proc.wait(timeout=timeout)
        except subprocess.TimeoutExpired:
            print(f"Process {proc} did not terminate in time, killing it")
            proc.kill()
AKX
  • 152,115
  • 15
  • 115
  • 172
  • You did not answer the question: "What are these extra processes that linger?" Also, can you explain why your solution does not have this problem? – AlwaysLearning Jul 25 '22 at 12:53
  • `terminate` (SIGTERM) gives the processes a chance to clean up. The lingering processes are probably subprocesses that weren't `wait`ed on, so they get moved to their parent to wait on. – AKX Jul 25 '22 at 12:54
  • But why were these subprocesses created in the first place? – AlwaysLearning Jul 25 '22 at 12:55
  • How should we know what the code in those spawned processes does? – AKX Jul 25 '22 at 13:00
  • You are right. Unfortunately, it's too large to show. `run_client` does spawn `multiprocessing.Process`, but `run_backup` does not do anything like that as far as I can think... – AlwaysLearning Jul 25 '22 at 13:06
  • I updated the question with the code I tried based on your reply. There are still lingering processes... – AlwaysLearning Jul 25 '22 at 13:17