I use python to connect multiple processing tools for NLP tasks together but also capture the output of each in case something fails and write it to a log.
Some tools need many hours and output their current status as a progress percentage with carriage returns (\r
). They do many steps, so they mix normal messages and progress messages. That results in sometimes really large log files that are hard to view with less
.
My log will look like this (for fast progresses):
[DEBUG ] [FILE] [OUT] ^M4% done^M8% done^M12% done^M15% done^M19% done^M23% done^M27% done^M31% done^M35% done^M38% done^M42% done^M46% done^M50% done^M54% done^M58% done^M62% done^M65% done^M69% done^M73% done^M77% done^M81% done^M85% done^M88% done^M92% done^M96% done^M100% doneFinished
What I want is an easy way to collapse those strings in python. (I guess it is also possible to do this after the pipeline is finished and replace progress messages with e. g. sed
...)
My code for running and capturing the output looks like this:
import subprocess
from tempfile import NamedTemporaryFile
def run_command_of(command):
try:
out_file = NamedTemporaryFile(mode='w+b', delete=False, suffix='out')
err_file = NamedTemporaryFile(mode='w+b', delete=False, suffix='err')
debug('Redirecting command output to temp files ...', \
'out =', out_file.name, ', err =', err_file.name)
p = subprocess.Popen(command, shell=True, \
stdout=out_file, stderr=err_file)
p.communicate()
status = p.returncode
def fr_gen(file):
debug('Reading from %s ...' % file.name)
file.seek(0)
for line in file:
# TODO: UnicodeDecodeError?
# reload(sys)
# sys.setdefaultencoding('utf-8')
# unicode(line, 'utf-8')
# no decoding ...
yield line.decode('utf-8', errors='replace').rstrip()
debug('Closing temp file %s' % file.name)
file.close()
os.unlink(file.name)
return (fr_gen(out_file), fr_gen(err_file), status)
except:
from sys import exc_info
error('Error while running command', command, exc_info()[0], exc_info()[1])
return (None, None, 1)
def execute(command, check_retcode_null=False):
debug('run command:', command)
out, err, status = run_command_of(command)
debug('-> exit status:', status)
if out is not None:
is_empty = True
for line in out:
is_empty = False
debug('[FILE]', '[OUT]', line.encode('utf-8', errors='replace'))
if is_empty:
debug('execute: no output')
else:
debug('execute: no output?')
if err is not None:
is_empty = True
for line in err:
is_empty = False
debug('[FILE]', '[ERR]', line.encode('utf-8', errors='replace'))
if is_empty:
debug('execute: no error-output')
else:
debug('execute: no error-output?')
if check_retcode_null:
return status == 0
return True
It is some older code in Python 2 (funny time with unicode strings) that I want to rewrite to Python 3 and improve upon. (I'm also open for suggestions in how to process the output in realtime and not when everything is finished. update: is too broad and not exactly part of my problem)
I can think of many approaches but do not know if there is a ready-to-use function/library/etc. but I could not find any. (My google-fu needs work.) The only things I found were ways to remove the CR/LF but not the string portion that gets visually replaced. So I'm open for suggestions and improvements before I invest my time in reimplementing the wheel. ;-)
My approach would be to use a regex to find sections in a string/line between \r
and remove them. Optionally I would keep a single percentage value for really long processes. Something like \r([^\r]*\r)
.
Note: A possible duplicate of: How to pull the output of the most recent terminal command?
It may require a wrapper script. It can still be used to convert my old log files with script2log
. I found/got a suggestion for a plain python way that fulfills my needs.