5

say I'm running an exe from a python script using:

subprocess.call(cmdArgs,stdout=outf, stderr=errf)

when outf and errf are file descriptors of text files.

is there any way I can generate on top of it a merged and synced text file of both stdout and stderr? it should be formatted with time and source(our/err).

thanks

miku
  • 181,842
  • 47
  • 306
  • 310
user515766
  • 349
  • 2
  • 5
  • 7
  • Possibly related question: http://stackoverflow.com/questions/4713932/decorate-delegate-a-file-object-to-add-functionality/4838875#4838875 – unutbu Feb 13 '11 at 14:11

2 Answers2

4

It is a bit tricky, since you need to poll the stdout and stderr file descriptors of the subprocess while it's running, to get accurate timestamps. You also need to chop up the output into a list of lines so the final results can be merged and sorted easily. You could easily merge the two streams as they're read, but that wasn't part of the question.

I wrote it quickly but it could be made cleaner and more compact:

import datetime
import os
import select
import subprocess

class Stream(object):

    def __init__(self, name, impl):
        self._name = name
        self._impl = impl
        self._buf = ''
        self._rows = []

    def fileno(self):
        "Pass-through for file descriptor."
        return self._impl.fileno()

    def read(self, drain=0):
        "Read from the file descriptor. If 'drain' set, read until EOF."
        while self._read() is not None:
            if not drain:
                break

    def _read(self):
        "Read from the file descriptor"
        fd = self.fileno()
        buf = os.read(fd, 4096)
        if not buf:
            return None
        if '\n' not in buf:
            self._buf += buf
            return []

        # prepend any data previously read, then split into lines and format
        buf = self._buf + buf
        tmp, rest = buf.rsplit('\n', 1)
        self._buf = rest
        now = datetime.datetime.now().isoformat()
        rows = tmp.split('\n')
        self._rows += [(now, '%s %s: %s' % (self._name, now, r)) for r in rows]

def run(cmd, timeout=0.1):
    """
    Run a command, read stdout and stderr, prefix with timestamp, and
    return a dict containing stdout, stderr and merged.
    """
    PIPE = subprocess.PIPE
    proc = subprocess.Popen(cmd, stdout=PIPE, stderr=PIPE)
    streams = [
        Stream('stdout', proc.stdout),
        Stream('stderr', proc.stderr)
        ]
    def _process(drain=0):
        res = select.select(streams, [], [], timeout)
        for stream in res[0]:
            stream.read(drain)

    while proc.returncode is None:
        proc.poll()
        _process()
    _process(drain=1)

    # collect results, merge and return
    result = {}
    temp = []
    for stream in streams:
        rows = stream._rows
        temp += rows
        result[stream._name] = [r[1] for r in rows]
    temp.sort()
    result['merged'] = [r[1] for r in temp]
    return result

res = run(['ls', '-l', '.', 'xyzabc'])
for key in ('stdout', 'stderr', 'merged'):
    print 
    print '\n'.join(res[key])
    print '-'*40

Example output:

stdout 2011-03-03T19:30:44.838145: .:
stdout 2011-03-03T19:30:44.838145: total 0
stdout 2011-03-03T19:30:44.838338: -rw-r--r-- 1 pat pat 0 2011-03-03 19:30 bar
stdout 2011-03-03T19:30:44.838518: -rw-r--r-- 1 pat pat 0 2011-03-03 19:30 foo
----------------------------------------

stderr 2011-03-03T19:30:44.837189: ls: cannot access xyzabc: No such file or directory
----------------------------------------

stderr 2011-03-03T19:30:44.837189: ls: cannot access xyzabc: No such file or directory
stdout 2011-03-03T19:30:44.838145: .:
stdout 2011-03-03T19:30:44.838145: total 0
stdout 2011-03-03T19:30:44.838338: -rw-r--r-- 1 pat pat 0 2011-03-03 19:30 bar
stdout 2011-03-03T19:30:44.838518: -rw-r--r-- 1 pat pat 0 2011-03-03 19:30 foo
----------------------------------------
samplebias
  • 37,113
  • 6
  • 107
  • 103
  • To merge properly, you need a sequence number in addition to the timestamp. The reason is that, when multiple lines are packed into a buffer, they all end up with the same timestamp. Python sorting is stable, however, so if you limit the sort key to just the timestamp, you don't need to introduce a sequence number. – George Nov 20 '11 at 05:59
1

You can merge them passing subprocess.STDOUT as the stderr argument for subprocess.Popen, but I don't know if they will be formatted with time and source.

Artur Gaspar
  • 4,407
  • 1
  • 26
  • 28
  • I would like it to be in addition to an stdout.txt and stderr.txt files, that is 3 outputs: 1. stdout, 2. stderr, 3. merge – user515766 Feb 13 '11 at 14:14