1

My question

I encountered a hang-up issue with the combination of threading, multiprocessing, and subprocess. I simplified my situation as below.

import subprocess
import threading
import multiprocessing

class dummy_proc(multiprocessing.Process):
    def run(self):
        print('run')
        while True:
            pass

class popen_thread(threading.Thread):
    def run(self):
        proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout_byte, stderr_byte = proc.communicate()
        rc = proc.returncode
        print(rc)

if __name__ == '__main__':
    print('start')
    t = popen_thread()
    t.start()

    p = dummy_proc()
    p.start()
    t.join()
    p.terminate()

In this script, a thread and a process are generated, respectively. The thread just issues the system command ls -la. The process just loops infinitely. When the thread finishes getting the return code of the system command, it terminates the process and exits immediately.

When I run this script again and again, it sometimes hangs up. I googled this situation and found some articles which seem to be related.

So, I guess the hang-up issue is explained something like below.

  1. The process is generated between Popen() and communicate().
  2. It inherits some "blocking" status of the thread, and it is never released.
  3. It prevents the thread from acquiring the result of the communitare().

But I'm not 100% confident, so it would be great if someone helped me explain what happens here.

My environment

I used following environment.

$ uname -a
Linux dell-vostro5490 5.10.96-1-MANJARO #1 SMP PREEMPT Tue Feb 1 16:57:46 UTC 2022 x86_64 GNU/Linux
$ python3 --version
Python 3.9.2

I also tried following environment and got the same result.

$ uname -a
Linux raspberrypi 5.10.17+ #2 Tue Jul 6 21:58:58 PDT 2021 armv6l GNU/Linux
$ python3 --version
Python 3.7.3

What I tried

  • Use spawn instead of fork for multiprocessing.
  • Use thread instead of process for dummy_proc.

In both cases, the issue disappeared. So, I guess this issue is related with the behavior of the fork...

hiroaki
  • 11
  • 3
  • This seems pretty convoluted. Which problem is solved by this design? – tripleee Feb 21 '22 at 19:10
  • This project is originally not designed by me but I need to maintain it. I want to understand the root cause to judge if I need to avoid fork() in this situation. – hiroaki Feb 22 '22 at 01:22

1 Answers1

0

This is a bit too long for a comment and so ...

I am having a problem understanding your statement that the problem disappears when you "Use thread instead of process for dummy_proc."

The hanging problem as I understand it is "that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child." In other words, the hanging problem arises when a fork is done when there exists one or more threads other than the main thread (i.e, the one associated with the main process).

If you execute a subprocess.Popen call from a newly created subprocess or a newly created thread, either way there will be by definition a new thread in existence prior to the fork done to implement the Popen call and I would think the potential for hanging exists.

import subprocess
import threading
import multiprocessing
import os


class popen_process(multiprocessing.Process):
    def run(self):
        print(f'popen_process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
        proc = subprocess.Popen('ls -la'.split(), shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout_byte, stderr_byte = proc.communicate()
        rc = proc.returncode

if __name__ == '__main__':
    print(f'main process, PID = {os.getpid()}, TID={threading.current_thread().ident}')
    multiprocessing.set_start_method('spawn')
    p = popen_process()
    p.start()
    p.join()

Prints:

main process, PID = 14, TID=140301923051328
popen_process, PID = 16, TID=140246240732992

Note the new thread with TID=140246240732992

It seems to me that you need to use startup method spawn as long as you are doing the Popen call from another thread or process if you want to be sure of not hanging. For what it's worth, on my Windows Subsystem for Linux I could not get it to hang with fork using your code after quite a few tries. So I am just going by what the linked answer warns against.

In any event, in your example code, there seems to be a potential race condition. Let's assume that even though your popen_process is a new thread, its properties are such that it does not give rise to the hanging problem (no mutexes are being held). Then the problem would be arising from the creation of the dummy_proc process/thread. The question then becomes whether your call to t1.start() completes the starting of the new process that ultimately runs the ls -la command prior to or after the completion of the creation of the dummy_proc process/thread. This timing will determine whether the new dummy_proc thread (there will be one regardless of whether dummy_proc inherits from Process or Thread as we have seen) will exist prior to the creation of the ls -la process. This race condition might explain why you sometimes were hanging. I would have no explanation for why if you make dummy_proc inherit from threading.Thread that you never hang.

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thanks for your comment. I suspect that the "blocking" status I mentioned here is inherited by not the process generated by `subprocess.Popen()` but `dummy_proc()`. I mean, `subprocess` acquires mutex or something like that and `dummy_proc()` inherits and never release it. Though this is just my guess and not confirmed, but if it is correct, changing `dummy_proc()` from process to thread will resolve the problem. – hiroaki Feb 22 '22 at 01:24
  • By the way, I also tried WSL1 and WSL2. The problem could be reproduced with WSL2 while it could not with WSL1. If you want to reproduce it, could you try WSL2? And, I want to avoid to discuss about WSL1 because the fork() mechanism of WSL1 is very different from normal Linux. – hiroaki Feb 22 '22 at 01:24
  • I am already running WSL2 (Debian). – Booboo Feb 22 '22 at 10:44