2

To start, I'm aware this looks like a duplicate. I've been reading:

Python subprocess readlines() hangs

Python Subprocess readline hangs() after reading all input

subprocess readline hangs waiting for EOF

But these options either straight don't work or I can't use them.

The Problem

# Obviously, swap HOSTNAME1 and HOSTNAME2 with something real
cmd = "ssh -N -f -L 1111:<HOSTNAME1>:80 <HOSTNAME2>"

p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=os.environ)
while True:
    out = p.stdout.readline()
    # Hangs here ^^^^^^^ forever

    out = out.decode('utf-8')
    if out:
        print(out)
    if p.poll() is not None:
        break

My dilemma is that the function calling the subprocess.Popen() is a library function for running bash commands, so it needs to be very generic and has the following restrictions:

  • Must display output as it comes in; not block and then spam the screen all at once
  • Can't use multiprocessing in case the parent caller is multiprocessing the library function (Python doesn't allow child processes to have child processes)
  • Can't use signal.SIGALRM for the same reason as multiprocessing; the parent caller may be trying to set their own timeout
  • Can't use third party non-built-in modules
  • Threading straight up doesn't work. When the readline() call is in a thread, thread.join(timeout=1)lets the program continue, but ctrl+c doesn't work on it at all, and calling sys.exit() doesn't exit the program, since the thread is still open. And as you know, you can't kill a thread in python by design.
  • No manner of bufsize or other subprocess args seems to make a difference; neither does putting readline() in an iterator.

I would have a workable solution if I could kill a thread, but that's super taboo, even though this is definitely a legitimate use case.

I'm open to any ideas.

Locane
  • 2,886
  • 2
  • 24
  • 35

2 Answers2

3

One option is to use a thread to publish to a queue. Then you can block on the queue with a timeout. You can make the reader thread a daemon so it won't prevent system exit. Here's a sketch:

import subprocess
from threading import Thread
from queue import Queue

def reader(stream, queue):
    while True:
        line = stream.readline()
        queue.put(line)
        if not line:
            break

p = subprocess.Popen(cmd, stdout=subprocess.PIPE, ...)
queue = Queue()
thread = Thread(target=reader, args=(p.stdout, queue))
thread.daemon = True
thread.start()
while True:
    out = queue.get(timeout=1)  # timeout is optional
    if not out:  # Reached end of stream
        break
    ...  # Do whatever with output

# Output stream was closed but process may still be running
p.wait()

Note that you should adapt this answer to your particular use case. For example, you may want to add a way to signal to the reader thread to stop running before reaching the end of stream.

Another option would be to poll the input stream, like in this question: timeout on subprocess readline in python

augurar
  • 12,081
  • 6
  • 50
  • 65
  • Thanks very much for your answer; the piece I was missing was knowing about the `daemon` flag. For some reason, you can't specify it in the Thread() initialization, you have to do `thread.daemon = True` after the fact. – Locane Oct 21 '19 at 06:37
  • @Locane `daemon` is a keyword-only argument to `Thread()` in Python 3, in Python 2 you have to set it via a property before calling `thread.start()`. – augurar Oct 21 '19 at 22:55
  • Thanks @augurar, it seems to be working with `thread.daemon = True` in either version for me – Locane Oct 22 '19 at 23:44
  • Yes, if you need to support Python 2 then that's the way to go. – augurar Oct 23 '19 at 00:02
-2

I finally got a working solution; the key piece of information I was missing was thread.daemon = True, which @augurar pointed out in their answer.

Setting thread.daemon = True allows the thread to be terminated when the main process terminates; therefore unblocking my use of a sub-thread to monitor readline().

Here is a sample implementation of my solution; I used a Queue() object to pass strings to the main process, and I implemented a 3 second timer for cases like the original problem I was trying to solve where the subprocess has finished and terminated, but the readline() is hung for some reason.

This also helps avoid a race condition between which thing finishes first.

This works for both Python 2 and 3.

import sys
import threading
import subprocess
from datetime import datetime

try:
    import queue
except:
    import Queue as queue # Python 2 compatibility


def _monitor_readline(process, q):
    while True:
        bail = True
        if process.poll() is None:
            bail = False
        out = ""
        if sys.version_info[0] >= 3:
            out = process.stdout.readline().decode('utf-8')
        else:
            out = process.stdout.readline()
        q.put(out)
        if q.empty() and bail:
            break

def bash(cmd):
    # Kick off the command
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)

    # Create the queue instance
    q = queue.Queue()
    # Kick off the monitoring thread
    thread = threading.Thread(target=_monitor_readline, args=(process, q))
    thread.daemon = True
    thread.start()
    start = datetime.now()
    while True:
        bail = True
        if process.poll() is None:
            bail = False
            # Re-set the thread timer
            start = datetime.now()
        out = ""
        while not q.empty():
            out += q.get()
        if out:
            print(out)

        # In the case where the thread is still alive and reading, and
        # the process has exited and finished, give it up to 3 seconds
        # to finish reading
        if bail and thread.is_alive() and (datetime.now() - start).total_seconds() < 3:
            bail = False
        if bail:
            break

# To demonstrate output in realtime, sleep is called in between these echos
bash("echo lol;sleep 2;echo bbq")
Locane
  • 2,886
  • 2
  • 24
  • 35
  • Can you explain how this is different from @augurar's answer above? – augurar Oct 29 '19 at 18:54
  • Yes, primarily it actually runs as opposed to your code sketch, and it solves a race condition problem present in your sketch that causes output to be cut off randomly when the subprocess is done executing but the output isn't finished being read. Try your implementation locally with some ssh commands or a curl GET call. Also, I'm not sure why you need to tag yourself in your own comment? – Locane Oct 30 '19 at 16:38
  • I want to help future readers of the question find the correct answer rather than this answer which is overly specific to your use case. – augurar Oct 30 '19 at 19:54