I have a multi-threaded program. When the program is running, some of the threads hang sometimes during the call to pexpect.spawn().
This program creates a Session object in each thread, and a Session object spawns a pexpect session when it is created. In the program, each Session object is binding to one specific thread.
class CustomizedThread(threading.Thread):
__init__(self, thread_name):
super().__init__(name=thread_name)
def run(self):
session = Session()
...
class Session:
__init__(self, session_name):
self.name = session_name
print('Thread {} is spawning a shell in session {}.'.format(
threading.currentThread(), session.name))
self.pexpect_session = pexpect.spawn('/bin/sh')
print('Thread {} finished spawning a shell in session {}.'.format(
threading.currentThread(), session.name))
self.pexpect_session.sendline('ssh MACHINE_NAME')
...
__del__(self):
print('Thread {} is cleaning up session {}.'.format(
threading.currentThread(), session.name))
self.pexepect_session.close(force=True)
The following is an example output when thread 2 is hanging, the destructor of Session object 1 is triggered during the call to pexpect.spawn() on thread 2.
...
Thread 1 is spawning a shell in session 1.
Thread 1 finished spawning a shell in session 1.
Thread 2 is spawning a shell in session 2.
Thread 2 is cleaning up session 1.
Attaching the hanging process to gdb, and I got the following stack trace. It shows the thread is hanging when attempting to write the exception message to a file descriptor:
(gdb) where
#0 0x00007fff9628391a in write () from /usr/lib/system/libsystem_kernel.dylib
#1 0x000000010ed7aa22 in _Py_write_impl (fd=2, buf=0x10f3f1010, count=76, gil_held=1) at ../Python/fileutils.c:1269
#2 0x000000010ed7a9a1 in _Py_write (fd=2, buf=0x10f3f1010, count=76) at ../Python/fileutils.c:1327
#3 0x000000010ede8795 in _io_FileIO_write_impl (self=0x10f3875f8, b=0x7000013dd168) at ../Modules/_io/fileio.c:840
#4 0x000000010ede7957 in _io_FileIO_write (self=0x10f3875f8, arg=0x11312c148)
at ../Modules/_io/clinic/fileio.c.h:245
#5 0x000000010ebbfd72 in PyCFunction_Call (func=0x112ed7b98, args=0x112fc6330, kwds=0x0)
at ../Objects/methodobject.c:134
#6 0x000000010eb2803d in PyObject_Call (func=0x112ed7b98, arg=0x112fc6330, kw=0x0) at ../Objects/abstract.c:2165
#7 0x000000010eb290de in PyObject_CallMethodObjArgs (callable=0x112ed7b98, name=0x10f234d40)
at ../Objects/abstract.c:2394
#8 0x000000010edf0456 in _bufferedwriter_raw_write (self=0x10f25de58,
start=0x10f3f1010 "\nThread 2 is cleaning up session 1. \"terminated\" is 0, but there was no child process. Did someone else call waitpid() on our process?\n"...,
len=76) at ../Modules/_io/bufferedio.c:1847
The exception message '"terminated" is 0, but there was no child process. Did someone else call waitpid() on our process?' is from the line where the pexpect session is closed
self.pexepect_session.close(force=True)
Also, in the spawn() method of pexpect, the process is forked (the process that I attached to in gdb) to execute '/bin/sh' and a pipe is created to write any exception message to the parent process.
It looks like the forked process garbage collected the Session object of another thread but caught a exception when trying to close a session on another thread. The process is hanging writing the exception message to the pipe because the exception message should've been read from the other side.