1

I'm having troubles using a Python API which is causing un-killable processses. I use one process per API execution. This works well if the API is able to work correctly. However, when the API fails to establish connection, trouble arises.

I have a main process which starts all the other API processes:

> ps ax | grep python
   3431 s000 S+ 0:06.14 .../python3.7 -u .../main.py   -> Main process

When sucessfully connected with the API (for example once instance), I get the following:

> ps ax | grep python
   3431 s000 S+ 0:12.59 .../python3.7 -u .../main.py   -> Main process
   3506 s000 S+ 0:00.34 .../python3.7 -u .../main.py   -> API connection process

Closing the successful connection kills the API process, as expected. No zombie processes arise.

However, if I start to use 3 instances of the API (which is possible), the API is able to connect correctly with two of the instances/processes but fails in one. When all processes are running I get the following:

> ps ax | grep python
   3431 s000 R+ 1:00.51 .../python3.7 -u .../main.py   -> Main process
   3594 s000 U+ 0:00.02 .../python3.7 -u .../main.py   -> API connection process, where connection FAILED
   3595 s000 R+ 0:03.10 .../python3.7 -u .../main.py   -> API connection process, where connection was established
   3596 s000 S+ 0:01.69 .../python3.7 -u .../main.py   -> API connection process, where connection was established

Closing the connections of the APIs, gets me to trouble:

> ps ax | grep python
   3431 s000 R+ 2:52.01 .../python3.7 -u .../main.py   -> Main process
   3594 s000 U+ 0:00.02 .../python3.7 -u .../main.py   -> API connection process, where connection FAILED
   3595 s000 Z+ 0:00.00 (python3.7)   -> API connection process (zombie), where connection was established
   3596 s000 Z+ 0:00.00 (python3.7)   -> API connection process (zombie), where connection was established
> ps aux | grep -w Z
   3595 0,0 0,0 0 0 s000 Z+ 10:04 0:00.00 (python3.7)  -> API process where connection was established
   3596 0,0 0,0 0 0 s000 Z+ 10:04 0:00.00 (python3.7)  -> API process where connection was established

I have tried following recommendations from previous posts from Stack Overflow, but no success. This was my best attempt, which didn't work:

> kill -9 3594  -> kill API process where connection FAILED
> ps ax | grep python
   3431 s000 R+ 1:00.51 .../python3.7 -u .../main.py   -> Main process
   3594 s000 ?E+ 0:00.00 (python3.7)   -> API connection process, where connection FAILED
   3595 s000 z+ 0:00.00 (python3.7)   -> API connection process, where connection was established
   3596 s000 z+ 0:00.00 (python3.7)   -> API connection process, where connection was established
> ps aux | grep -w Z
   3595 0,0 0,0 0 0 s000 Z+ 10:04 0:00.00 (python3.7)   -> API connection process, where connection was established
   3596 0,0 0,0 0 0 s000 Z+ 10:04 0:00.00 (python3.7)   -> API connection process, where connection was established
> kill -1 3431   -> Kill main process
> ps aux | grep -w Z
   no processes
> ps ax | grep python
   3594 s000 ?E+ 0:00.00 (python3.7)  -> API connection process, where connection FAILED
> kill -9 3594
> ps ax | grep python
   3594 s000 ?E+ 0:00.00 (python3.7)  -> API connection process, where connection FAILED
> ps o ppid 3594
   PID 1

The processes are being started with the fork method (using Python's multiprocessing library). The problem by having these unkillable processes is that in the main process I have a websocket connection and, even though I kill the main process, that un-killable process has some connection to it, which makes the websocket connection to stay alive forever. I can't to kill that process in any way. The only way is to force a reboot of the computer...

Any help, please?

DTake
  • 113
  • 1
  • 9
  • Please provide [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). – Tupteq Feb 17 '20 at 12:41
  • Unfortunately there is no way to reproduce this, as I can't share the API and to use it you would need three 10k€ devices. I shared as many details as possible about the processes. – DTake Feb 17 '20 at 13:04
  • I bet the problem isn't related to these devices, so you could try to isolate the problematic code and make some synthetic (but working) example. Currently I don't even know what Python libraries are in use. – Tupteq Feb 17 '20 at 13:19
  • How can I simulate such an un-killable process? I've tried it but with no luck - I am always able to kill it – DTake Feb 17 '20 at 13:26
  • Have you considered that one of those devices is defect? related: [what-is-an-uninterruptible-process](https://stackoverflow.com/questions/223644/what-is-an-uninterruptible-process) – Darkonaut Feb 17 '20 at 14:10
  • The devices are all ok, it is probably an API bug (which is no longer supported). Anyways I don't see a potential reason why that process is unkillable. – DTake Feb 17 '20 at 14:13

0 Answers0