1

I expect this is really simple but I can't work this out.

I am trying to write to a log file in real time the output from a DD imaging subprocess - I'm using DD v 8.25 from which you can get regular progress updates using the 'status=progress' option which writes to stderr.

I can get it to log the full output real time by passing the file object to the stderr i.e

log_file = open('mylog.log', 'a')
p = subprocess.Popen['dd command...'], stdout=None, stderr=log_file)

...but I would prefer to intercept the string from stderr first so I can parse it before writing to file.

I have tried threading but I can't seem to get it to write, or if it does, it only does it at the end of the process and not during.

I'm a python noob so example code would be appreciated. Thanks!

UPDATE - NOW WORKING (ISH)

I had a look at the link J.F. Sebastian suggested and found posts about using threads, so after that I used the "kill -USR1" trick to get DD to post progress to stderr which I could then pick up:

#! /usr/bin/env python
from subprocess import PIPE, Popen
from threading import Thread
from queue import Queue, Empty
import time

q = Queue()

def parsestring(mystr):
    newstring = mystr[0:mystr.find('bytes')]
    return newstring

def enqueue(out, q):
    for line in proc1.stderr:
        q.put(line)
    out.close()

def getstatus():
    while proc1.poll() == None:
        proc2 = Popen(["kill -USR1 $(pgrep ^dd)"], bufsize=1, shell=True)
        time.sleep(2)

with open("log_file.log", mode="a") as log_fh:
    start_time = time.time()

    #start the imaging
    proc1 = Popen(["dd if=/dev/sda1 of=image.dd bs=524288 count=3000"], bufsize=1, stderr=PIPE, shell=True)

    #define and start the queue function thread
    t = Thread(target=enqueue, args=(proc1.stderr, q))
    t.daemon = True
    t.start()

    #define and start the getstatus function thread
    t_getstatus = Thread(target=getstatus, args=())
    t_getstatus.daemon
    t_getstatus.start()

    #get the string from the queue

    while proc1.poll() == None:
        try: nline = q.get_nowait()
        except Empty:
            continue
        else:
            mystr = nline.decode('utf-8')           
            if mystr.find('bytes') > 0:
                log_fh.write(str(time.time()) + ' - ' + parsestring(mystr))
                log_fh.flush()

        #put in a delay
        #time.sleep(2)

    #print duration
    end_time=time.time()
    duration=end_time-start_time
    print('Took ' + str(duration) + ' seconds')     

The only issue is I can't work out how to improve performance. I only need it to report status every 2 seconds or so but increasing the time delay increases the time of the imaging, which I don't want. That's a question for another post though...

Thanks to both J.F. Sebastian and Ali.

fishboy25uk
  • 13
  • 1
  • 4
  • Take a look http://stackoverflow.com/questions/2804543/read-subprocess-stdout-line-by-line. in your case store the output or errout to a variable before to write it to a file at the send of your process. – Ali SAID OMAR Jan 25 '16 at 10:08
  • that example doesn't appear to work - again, I don't get any logging until the process has completed. – fishboy25uk Jan 25 '16 at 10:35

2 Answers2

0

With this example it's possible (with python 3) to stream from stderr to console:

#! /usr/bin/env python
from subprocess import Popen, PIPE

# emulate a program that write on stderr
proc = Popen(["/usr/bin/yes 1>&2 "],  bufsize=512, stdout=PIPE, stderr=PIPE, shell=True)
r = b""
for line in proc.stderr:
    r += line
    print("current line", line, flush=True)

To stream to a file:

#! /usr/bin/env python
from subprocess import Popen, PIPE

with open("log_file.log", mode="b",  encoding="utf8") as log_fh:
        proc = Popen(["/usr/bin/yes 1>&2 "],  bufsize=512, stdout=PIPE, stderr=PIPE, shell=True)
        r = b""
        # proc.stderr is an io.TextIOWrapper file-like obj
    # iter over line
        for line in proc.stderr:
                r += line
                # print("current line", line, flush=True)
                log_fh.write(line) # file open in binary mode
                # log_fh.write(line.decode("utf8")) # for text mode
                log_fh.flush() # flush the content
Ali SAID OMAR
  • 6,404
  • 8
  • 39
  • 56
  • Thanks, but didn't work. It worked fine with your example but using my DD command "dd if=/dev/sdb1 of=image.dd status=progress", I still get nothing until the process has finished. This may be a down to DD and how it does output. – fishboy25uk Jan 25 '16 at 14:15
  • (1) don't use `stdout=PIPE` unless you read from the `proc.stdout` otherwise a deadlock is possible in the general case (2) Don't use `str(bytes_object)`; instead, open the file in binary mode or decode the bytes first (using `universal_newlines=True` or `io.TextIOWrapper(proc.stderr, encoding='utf-8')` or `line.decode('utf-8')` (if `b'\n'`-separated lines are enough in your case)). – jfs Jan 26 '16 at 10:15
  • Thanks for the advices. I was sure that str on bytes making python calling decode on bytes. – Ali SAID OMAR Jan 26 '16 at 10:26
  • note: `print(bytestring)` calls `str()` implicitly. You should decode bytes into a Unicode text before printing them instead e.g., you could pass `universal_newlines=True` to `Popen()` to enable the text mode. Note: your code won't work with `dd status=progress` because it doesn't produce `b'\n'` until the very end (it uses `b'\r'` instead, to type the info in the same place). – jfs Feb 02 '16 at 16:56
0

To display dd's progress report in a terminal and to save (parsed) output to a log file:

#!/usr/bin/env python3
import io
from subprocess import PIPE, Popen
from time import monotonic as timer

cmd = "dd if=/dev/sda1 of=image.dd bs=524288 count=3000 status=progress".split()
with Popen(cmd, stderr=PIPE) as process, \
        open("log_file.log", "a") as log_file:
    start_time = timer()
    for line in io.TextIOWrapper(process.stderr, newline=''):
        print(line, flush=True, end='')  # no newline ('\n')
        if 'bytes' in line:
            # XXX parse line here, add flush=True if necessary
            print(line, file=log_file)
    # print duration
    print('Took {duration} seconds'.format(duration=timer() - start_time))

Note

  • no shell=True: you don't need the shell here. Popen() can run dd directly
  • no threads, queues: you don't need them here
  • please, please DO NOT USE while proc1.poll() == None You don't need it here (you'll see EOF on proc1.stderr if proc1.poll() is not None). You may lose data (there could be a buffered content even if the process has exited already). Unrelated: if you need to compare with None; use is None instead of == None
  • io.TextIOWrapper(newline='') enables text mode (it uses locale.getpreferredencoding(False)) and it treats '\r' as a newline too
  • use the default bufsize=-1 (see io.DEFAULT_BUFFER_SIZE)
jfs
  • 399,953
  • 195
  • 994
  • 1,670