5

The Problem

I've been trying to write a program that logs uncaught exceptions and syntax errors for a subprocess. Easy, right? Just pipe stderr to the right place.

However, the subprocess is another python program- I'll call it test.py- that needs to run as if its output/errors are not being captured. That is, running the logger program needs to seem like the user has just run python test.py as normal.

Further complicating the issue is the problem that raw_input actually gets sent to stderr if readline is not used. Unfortunately, I can't just import readline, since I don't have control over the files that are being run using my error logger.

Notes:

  • I am fairly restricted on the machines that this code will run on. I can't install pexpect or edit the *customize.py files (since the program will be run by a lot of different users). I really feel like there should be a stdlib solution anyway though...
  • This only has to work on macs.
  • The motivation for this is that I'm part of a team researching the errors that new programmers get.

What I've Tried

I've tried the following methods, without success:

  • just using tee as in the question How do I write stderr to a file while using "tee" with a pipe? (failed to produce raw_input prompts); python implementations of tee that I found in several SO questions had similar issues
  • overwriting sys.excepthook (failed to make it work for a subprocess)
  • this question's top answer seemed promising, but it failed to display raw_input prompts correctly.
  • the logging module seems useful for actually writing to a log file, but doesn't seem to get at the crux of the issue
  • custom stderr readers
  • endless googling
Community
  • 1
  • 1
Matthew Adams
  • 9,426
  • 3
  • 27
  • 43
  • how are you required to distribute your program? Are you allowed to use [PyInstaller](http://www.pyinstaller.org/), a [zip archive](http://stackoverflow.com/a/5356563/4279) to pack your program? `pexpect` is pure Python, you could pack it with your program if you are not allowed to install anything. – jfs Sep 23 '12 at 21:47
  • 1
    Does the subprocess actually have to be a subprocess? Can you run the user-provided code with `execfile` instead? – nneonneo Sep 27 '12 at 00:34
  • @nneonneo Yes it has to be subprocess, because the users' code must have stuff like `sys.argv` behave as expected. – Matthew Adams Sep 27 '12 at 01:46
  • `sys.argv` is no problem at all. You could replace `sys.argv`. How are you currently running your logger around the user's code? Do they explicitly invoke it? If not, what's your hook? – nneonneo Sep 27 '12 at 01:56
  • @nneonneo Yes, they explicitly invoke it. I know I could just replace it, but I can't think of every similar case when something like that would be a problem. – Matthew Adams Sep 27 '12 at 04:08
  • 1
    I've done this sort of thing before, believe it or not -- I used to work on Python autograding systems. You can have scripts execute in sanitized environments without jumping through hoops, and their code executes as if it was invoked in a separate interpreter. Give it a try -- the flexibility of putting your own code in the target environment lets you do a lot more sophisticated things than you would otherwise be able to do (like introspecting stacktraces and capturing code and execution environment). – nneonneo Sep 27 '12 at 04:21
  • @nneonneo You did all of that using `execfile` !? That sounds very similar to what I'll be doing... – Matthew Adams Sep 27 '12 at 04:27
  • Technically, `exec`, but it's the same thing (basically). We autograded submissions for a class of over 750 non-CS students (most new to Python, many new to programming) using the first version of the software (which I was hired to develop) and graded everything from interactive programs (fed artificial input from the test harness) to regular functions and classes (which we could call with any arguments to check their responses). We logged all exceptions, warnings, output, etc. and could provide detailed feedback on test failures and exceptions to the students. – nneonneo Sep 27 '12 at 05:01
  • a simplified version of `python -mpdb test.py` might be more robust than unsophisticated `execfile()`- or `subprocess`- based solutions – jfs Oct 01 '12 at 19:52

2 Answers2

2

The tee-based answer that you've linked is not very suitable for your task. Though you could fix "raw_input() prompts" issue by using -u option to disable buffering:

errf = open('err.txt', 'wb') # any object with .write() method
rc = call([sys.executable, '-u', 'test.py'], stderr=errf, 
          bufsize=0, close_fds=True)
errf.close()

A more suitable solution might be based on pexpect or pty, example.

running the logger program needs to seem like the user has just run python test.py as normal.

#!/usr/bin/env python
import sys
import pexpect

with open('log', 'ab') as fout:
    p = pexpect.spawn("python test.py")
    p.logfile = fout
    p.interact()

You don't need to install pexpect it is pure Python you could put it along-side your code.

Here's a tee-based analog (test.py is run non-interactively):

#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT
from threading  import Thread

def tee(infile, *files):
    """Print `infile` to `files` in a separate thread."""
    def fanout(infile, *files):
        flushable = [f for f in files if hasattr(f, 'flush')]
        for c in iter(lambda: infile.read(1), ''):
            for f in files:
                f.write(c)
            for f in flushable:
                f.flush()
        infile.close()
    t = Thread(target=fanout, args=(infile,)+files)
    t.daemon = True
    t.start()
    return t

def call(cmd_args, **kwargs):
    stdout, stderr = [kwargs.pop(s, None) for s in 'stdout', 'stderr']
    p = Popen(cmd_args,
              stdout=None if stdout is None else PIPE,
              stderr=None if stderr is None else (
                   STDOUT if stderr is STDOUT else PIPE),
              **kwargs)
    threads = []
    if stdout is not None: 
        threads.append(tee(p.stdout, stdout, sys.stdout))
    if stderr is not None and stderr is not STDOUT: 
        threads.append(tee(p.stderr, stderr, sys.stderr))
    for t in threads: t.join() # wait for IO completion
    return p.wait()

with open('log','ab') as file:
    rc = call([sys.executable, '-u', 'test.py'], stdout=file, stderr=STDOUT,
              bufsize=0, close_fds=True)

It is necessary to merge stdout/stderr due to it is unclear where raw_input(), getpass.getpass() might print their prompts.

In this case the threads are not necessary too:

#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT

with open('log','ab') as file:
    p = Popen([sys.executable, '-u', 'test.py'],
              stdout=PIPE, stderr=STDOUT,
              close_fds=True,
              bufsize=0)
    for c in iter(lambda: p.stdout.read(1), ''):
        for f in [sys.stdout, file]:
            f.write(c)
            f.flush()
    p.stdout.close()
    rc = p.wait()

Note: the last example and tee-based solution don't capture getpass.getpass() prompt, but pexpect and pty-based solution do:

#!/usr/bin/env python
import os
import pty
import sys

with open('log', 'ab') as file:
    def read(fd):
        data = os.read(fd, 1024)
        file.write(data)
        file.flush()
        return data

    pty.spawn([sys.executable, "test.py"], read)

I don't know whether pty.spawn() works on macs.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • What makes it not suitable? Wouldn't those other modules be doing something similar behind the scenes? (I'm reading up on them right now.) – Matthew Adams Sep 20 '12 at 09:07
  • Also, the `-u` flag doesn't seem to fix the issue. (I'm using your `call` function from the other answer and the code in this one.) – Matthew Adams Sep 20 '12 at 09:14
  • @MatthewAdams: a program may change other aspects of its behavior in addition to I/O buffering behavior when it detects that it runs interactively (using tty). I've tested it with `test.py`: `print raw_input('input string to reverse')[::-1]` and it works. btw, `raw_input()` prints its prompt to stdout (as documented), not stderr. – jfs Sep 20 '12 at 09:29
  • Hmm, when I run that exact `test.py`, the prompt is not printed and instead gets logged to `err.txt`... I see that the main docs says that it prints to stdout, but I definitely read something official that said otherwise yesterday; I'll go hunt around for it. – Matthew Adams Sep 20 '12 at 10:01
  • 1
    Turns out I'm running into [this issue](http://bugs.python.org/issue1927). So if I add `import readline` to `test.py`, it works as expected. I obviously don't want to have to do that though, because I don't write the input files. (I'm still confused, but you get my upvote back.) – Matthew Adams Sep 20 '12 at 10:09
  • 1
    @MatthewAdams: [`builtin_input()`](http://hg.python.org/cpython/file/default/Python/bltinmodule.c#l1607) (Python 3.2) and [`builtin_raw_input()`](http://hg.python.org/cpython/file/2.7/Python/bltinmodule.c#l2016) (Python 2.7) use `PyOS_Readline()` at C level if there is a tty and print to *stdout* in non-interactive case. If `readline` is not enabled then indeed `PyOS_Readline()` prints its [prompt to stderr](http://hg.python.org/cpython/file/786aa16aeda5/Parser/myreadline.c#l105). You could add `import readline` to `sitecustomize.py`, `usercustomize.py` if it resolves the issue. – jfs Sep 20 '12 at 11:09
  • Unfortunately I can't edit `sitecustomize.py` or `usercustomize.py` on the machines I have to run this on. (multiuser college computers...) – Matthew Adams Sep 23 '12 at 19:28
  • I've been working on a `pexpect` solution, but I just learned that it won't be on the machines and... I am not allowed to install it. – Matthew Adams Sep 23 '12 at 19:35
  • @MatthewAdams: `pip install --user pexpect`. usercustomize is intended for a user site-package directory (inside your home directory), read http://docs.python.org/library/site.html – jfs Sep 23 '12 at 21:03
  • @MatthewAdams: have you tried the `pty` solution? Have you tried to modify `tee()` to read/write one byte at a time instead of one line at a time? – jfs Sep 23 '12 at 21:52
  • Reading one byte at a time worked great- thanks! Throw that in your answer for an accept. – Matthew Adams Sep 26 '12 at 20:38
  • @MatthewAdams: I've added the code. It demonstrates how `pexpect` could be used for the task. btw, `pexpect` has liberal license `MIT style -- Free, open source, and all that good stuff`. – jfs Sep 26 '12 at 23:56
  • @MatthewAdams: I've added `pty.spawn()`-based solution. It works on Linux. I don't know how well it works on macs. – jfs Sep 27 '12 at 00:31
  • That `pexpect` solution captures everything- not just stderr. – Matthew Adams Sep 27 '12 at 04:26
  • @MatthewAdams: As far as I know there is no way to separate stdout/stderr in `pexpect` (`pty`-based solutions). (remove the accepted mark if the answer is not accepted. `exec`-based seems like the way to go in your case) – jfs Sep 27 '12 at 08:15
  • I accepted my answer as the best approach for my specific problem and gave you the bounty as thanks for all your help and for the ton of useful info in your answer. – Matthew Adams Sep 28 '12 at 00:15
2

Based on @nneonneo 's advice in the question comments, I made this program that seems to get the job done. (Note that currently, the name of the logger file has to by "pylog" for the errors to be printed correctly to the end user.)

#!/usr/bin/python

'''
This module logs python errors.
'''

import socket, os, sys, traceback

def sendError(err):
    # log the error (in my actual implementation, this sends the error to a database)
    with open('log','w') as f:
        f.write(err)


def exceptHandler(etype, value, tb):
    """An additional wrapper around our custom exception handler, to prevent errors in
       this program from being seen by end users."""
    try:
        subProgExceptHandler(etype, value, tb)
    except:
        sys.stderr.write('Sorry, but there seems to have been an error in pylog itself. Please run your program using regular python.\n')

def subProgExceptHandler(etype, value, tb):
    """A custom exception handler that both prints error and traceback information in the standard
       Python format, as well as logs it."""
    import linecache

    errorVerbatim = ''

    # The following code mimics a traceback.print_exception(etype, value, tb) call.
    if tb:
        msg = "Traceback (most recent call last):\n"
        sys.stderr.write(msg)
        errorVerbatim += msg

        # The following code is a modified version of the trackeback.print_tb implementation from
        # cypthon 2.7.3
        while tb is not None:
            f = tb.tb_frame                                                      
            lineno = tb.tb_lineno                                                  
            co = f.f_code                                                        
            filename = co.co_filename                                              
            name = co.co_name
            # Filter out exceptions from pylog itself (eg. execfile).
            if not "pylog" in filename:
                msg = '  File "%s", line %d, in %s\n' % (filename, lineno, name)
                sys.stderr.write(msg)       
                errorVerbatim += msg
                linecache.checkcache(filename)                                         
                line = linecache.getline(filename, lineno, f.f_globals)                
                if line: 
                    msg = '    ' + line.strip() + '\n'
                    sys.stderr.write(msg)
                    errorVerbatim += msg
            tb = tb.tb_next                                           

    lines = traceback.format_exception_only(etype, value)
    for line in lines:
        sys.stderr.write(line)
        errorVerbatim += line

    # Send the error data to our database handler via sendError.
    sendError(errorVerbatim)

def main():
    """Executes the program specified by the user in its own sandbox, then sends
       the error to our database for logging and analysis."""
    # Get the user's (sub)program to run.
    try:
        subProgName = sys.argv[1]
        subProgArgs = sys.argv[3:]
    except:
        print 'USAGE: ./pylog FILENAME.py *ARGS'
        sys.exit()

    # Catch exceptions by overriding the system excepthook.
    sys.excepthook = exceptHandler
    # Sandbox user code exeuction to its own global namespace to prevent malicious code injection.
    execfile(subProgName, {'__builtins__': __builtins__, '__name__': '__main__', '__file__': subProgName, '__doc__': None, '__package__': None})

if __name__ == '__main__':
    main()
Matthew Adams
  • 9,426
  • 3
  • 27
  • 43
  • why do you catch `SystemExit`? it is how `sys.exit()` works. You could use `traceback.print_exception` for `1.` It is easy to mess with the exec-based solutions from inside the subprogram on purpose or accidentally e.g., `import __main__; del __main__.traceback` or `sys.stderr = open(os.devnull)`. – jfs Sep 28 '12 at 01:26
  • @J.F.Sebastian Honestly, habit- I guess it doesn't really make sense here. I plan to use `print_exception`. Yeah, that is a really good point; do you think a bunch of resets after the code is run will resolve most of those types of exploits? – Matthew Adams Sep 28 '12 at 01:36
  • Accidental breakage could be minimized if you save necessary functions as locals e.g., as default parameters to your function, provide the 2nd arg to execfile() e.g., `{"__file__": ...}` (to avoid polluting test.py namespace) and (maybe) use `try/except:` in case sys.excepthook is overwritten. – jfs Sep 28 '12 at 10:06
  • @nneonneo How did you handle the types of problems we're talking about in the comments here? – Matthew Adams Oct 02 '12 at 18:33