0

I am trying to get an exception from a subprocess. I can get it if I use .communicate, but I would like to avoid using that since I am streaming output from the subprocess as it occurs, and dont want to wait until the whole subprocess is complete. Also assume the entire subprocess can take a very long time. Was wondering how I can catch an exception thrown while streaming the stdout from subprocess.

Consider the example below, so I would like to get version #1 working, version #2 kinda works, but dont want it that way.

In main.py

import subprocess


class ExtProcess():
    def __init__(self, *args):
        self.proc = subprocess.Popen(['python', *args], stdout=subprocess.PIPE)

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            line = self.proc.stdout.readline()
            if self.proc.returncode:
                raise Exception("error here")
            if not line:
                raise StopIteration
            return line


def run():
    ## version #1
    reader = ExtProcess("sample_extproc.py")
    for d in reader:
        print(f"got: {d}")

    ## version #2
    # proc = subprocess.Popen(['python', "sample_extproc.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # output, error = proc.communicate()
    # print("got:", output)
    # if proc.returncode:
    #     raise Exception(error)

def main():
    try:
        print("start...")
        run()
        print("complete...")
    except Exception as e:
        print(f"Package midstream error here: {str(e)}")
    finally:
        print("clean up...")


if __name__ == "__main__":
    main()

In sample_extproc.py

for x in range(10):
    print(x)
    if x == 3:
        raise RuntimeError("catch me")

I would like to get an output of something like below, from version #1:

start...
got: b'0\r\n'
got: b'1\r\n'
got: b'2\r\n'
got: b'3\r\n'
Package midstream error here: b'Traceback (most recent call last):\r\n  File "sample_extproc.py", line 4, in <module>\r\n    raise RuntimeError("catch me")\r\nRuntimeError: catch me\r\n'
clean up...

Basically it iterates through the stdout from the subprocess, then prints the exception when it occurs, then continues performing cleanup.

user1179317
  • 2,693
  • 3
  • 34
  • 62
  • Programs don't return an exit status until they _exit_. Does your subprocess exit when it throws an exception? – Charles Duffy Oct 14 '20 at 17:37
  • ...so there's very little point to trying to check `returncode` while you're still reading data. Sure, the process _could_ close its stdout before it exits, or for a few tens of bytes to still be in the FIFO at exit time, but it's very unusual for that to be more than a matter of milliseconds, and even then the parent process won't know the exit status until it invoked the `wait()` syscall to retrieve it (when reaping the PID of the exited process from the zombie entry it leaves in the process table for the time between the exit call and its parent process reading that status). – Charles Duffy Oct 14 '20 at 17:38
  • I do feel like asking 'what is your larger goal?' might be appropriate here. I'm a fan of 'answer the question that was asked' but I can't help but feel there is another way to approach the problem (depending on what it is). – Marcel Wilson Oct 14 '20 at 17:40
  • @CharlesDuffy doesnt it exit automatically when an exeption is thrown in the subprocess? – user1179317 Oct 14 '20 at 17:42
  • @user1179317, usually, if not overridden (f/e, by an exception handler that catches the exception, prints it, but then continues execution). But the fact that you thought it was necessary and appropriate to write this code implies that something _was_ overridden. – Charles Duffy Oct 14 '20 at 17:43
  • @MarcelWilson the main idea is to stream the results (stdout) as it occurs to the user. Then if an exception is thrown in the middle, to provide that error/info to the user as well – user1179317 Oct 14 '20 at 17:45
  • _Usually_, common practice is to read stdout all the way to the end, and then once you reach the end, call `p.wait()` to set `p.returncode`. – Charles Duffy Oct 14 '20 at 17:45
  • 1
    ...if your child process exits as soon as it sees an exception, that exception won't be "in the middle", but will be at the end of the stream, so you can go straight from your `for line in proc.self.stdout:` loop exiting to the `p.wait()`, and checking `p.returncode` as soon as `wait` returns. – Charles Duffy Oct 14 '20 at 17:46
  • Yea like I mentioned, dont really want to wait until the end, for user to get anything, or dont want to buffer everything in stdout, especially if we are writing tons of data into stdout. Basically would like to flush it out really to another application or user as we write to stdout, mainly to minimize memory use within that environment. I dont have to use p.wait if I use p.communicate() but I dont want to do that – user1179317 Oct 14 '20 at 17:51
  • @CharlesDuffy Yes, I guess you can say its not in the middle, its at the end – user1179317 Oct 14 '20 at 17:52
  • @user1179317 My "spidey sense" is tingling on this one. I could be completely off-base, but running something in subprocess that has a chance of raising an exception but runs anyway feels awkward. I usually think of an exception that is raised as being something the script should catch and deal with or exit. It sounds like you catch it, print the stack trace, but then continue anyway. At that point you've technically 'dealt with it'. What is the parent process going to be doing with the re-catching of that exception? – Marcel Wilson Oct 14 '20 at 18:02
  • @MarcelWilson Ok, so maybe I am not clear. But if the exception is thrown in the subprocess, I dont want to continue, I want it to exit the subprocess and raise the exception. What I do want, is to print the stdout from the subprocess, as it happens, then once the exception is thrown in the subprocess, exit the subprocess, and have the parent process print the exception out as well, go to clean up or finally block, then execution is complete. – user1179317 Oct 14 '20 at 18:16
  • 1
    @CharlesDuffy Sorry just understood what you mentioned. So yea, it actually works if after I loop through the stdout, have p.wait, then check p.returncode – user1179317 Oct 14 '20 at 18:46
  • ...given which, is there still a question here? – Charles Duffy Oct 14 '20 at 20:18
  • @CharlesDuffy no more question. Should I update my question with your answer then? – user1179317 Oct 14 '20 at 20:20
  • No -- answers belong in answers, not questions. I'd write one up, but I'm not sure I understand why there was a question in the first place well enough to do so; feel free to add your own answer (via the "Post Your Answer" button) using anything my comments helped you learn; after a 3-day timeout you'll be able to accept it, and thus mark your question solved. – Charles Duffy Oct 14 '20 at 20:24

1 Answers1

1

Here's the answer to my question below, really based on @CharlesDuffy's comment:

In short, make sure to have stderr=subprocess.PIPE in ExtProcess class, then the answer is in version #3, where after iterating through the stdout, we utilize .wait() and returncode to check if there was an error, if so raise an exception, grabbing the error from stderr.read() to be catched in the parent/main.

import subprocess

class ExtProcess():
    def __init__(self, *args):
        self.proc = subprocess.Popen(['python', *args], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            line = self.proc.stdout.readline()
            if not line:
                raise StopIteration
            return line


def run():
    ## version #1
    # reader = ExtProcess("sample_extproc.py")
    # for d in reader:
    #     print(f"got: {d}")

    ## version #2
    # proc = subprocess.Popen(['python', "sample_extproc.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # output, error = proc.communicate()
    # print("got:", output)
    # if proc.returncode:
    #     raise Exception(error)

    ## version #3
    reader = ExtProcess("sample_extproc.py")
    for d in reader:
        print(f"got: {d}")
    reader.proc.wait()
    if reader.proc.returncode:
       raise Exception(reader.proc.stderr.read())

def main():
    try:
        print("start...")
        run()
        print("complete...")
    except Exception as e:
        print(f"Package midstream error here: {str(e)}")
    finally:
        print("clean up...")


if __name__ == "__main__":
    main()
user1179317
  • 2,693
  • 3
  • 34
  • 62
  • So, the only thing that worries me reading this code is if your program tries to write more content than will fit in the pipeline to stderr before it's closed stdout, that write could block because nothing's reading it (stderr isn't just for errors -- it's also where "diagnostic content" like logs belong); so you could end up in a deadlock if the program doesn't finish writing to stdout (or doesn't attempt to close stdout) until that write to stderr completes. – Charles Duffy Oct 14 '20 at 21:20
  • @CharlesDuffy Hmm not sure if I fully understand, but is there a timeout of something I could use? I am assuming whenever I get to p.wait(), stdout has reached the StopIteration, so I shouldnt really wait too long at p.wait() – user1179317 Oct 14 '20 at 21:52
  • The potential problem is the _child process_ deadlocking. Try using a child process that writes a few kb of data to stderr partway through its execution -- its attempt to write that data will hang unless something in the parent process is actively reading from stderr _at that time_; when you don't read from stderr at all until after stdout exited, that means that you can't safely run child processes that write more than a FIFO buffer's worth of data to stderr while they still have stdout open. – Charles Duffy Oct 14 '20 at 22:04
  • ...if your child process is stuck unable to finish a write to stderr, that means it never finishes writing content to (and then closing) stdout, so the parent process never exits its loop. – Charles Duffy Oct 14 '20 at 22:05
  • One way to avoid that is to have a separate thread reading content from stderr into a buffer as it comes in. – Charles Duffy Oct 14 '20 at 22:06
  • 1
    Another way is to use [`selectors`](https://docs.python.org/3/library/selectors.html); some of the answers in https://stackoverflow.com/questions/31833897/python-read-from-subprocess-stdout-and-stderr-separately-while-preserving-order go into it. Just reading from stderr into your parent process and storing the data for later reference is good enough. – Charles Duffy Oct 14 '20 at 22:09
  • 1
    ...the point of using `selectors` to read from both stdout and stderr at the same time, or a thread reading from `proc.stderr` in the background, or so forth is to make sure that stderr gets consumed when the child process writes it, so the child process can't hang trying to wait for a full FIFO buffer attached to its stderr to have room for more data to be written (while the parent makes no effort to read from and thus empty that FIFO at all). – Charles Duffy Oct 14 '20 at 22:11
  • @CharlesDuffy makes sense, thanks for the detailed explanation. Will take a look at the selectors or having a separate thread reading the stderr. Definitely appreciate going an extra mile to just my answer, and looking into potential/further issues – user1179317 Oct 15 '20 at 13:58