2

As part of an evaluation, I want to measure and compare the user+system runtime of different diff-tools. As a first approach, I thought about calling the particular tools with time - f (GNU time). Since the rest of the evaluation is done by a bunch of Python scripts I want to realise it in Python.

The time output is formatted as follows:

<some error message>
user 0.4
sys 0.2

The output of the diff tool is redirected to sed to get rid of unneeded output and the output of sed is then further processed. (use of sed deprecated for my example. See Edit 2)

A call from within a shell would look like this (removes lines starting with "Binary"):

$ time -f "user %U\nsys %S\n" diff -r -u0 dirA dirB | sed -e '/^Binary.*/d'

Here is my approach so far:

import subprocess

diffcommand=["time","-f","user %U\nsys %S\n","diff","-r","-u0","testrepo_1/A/rev","testrepo_1/B/rev"]
sedcommand = ["sed","-e","/^Binary.*/d"]

# Execute command as subprocess
diff = subprocess.Popen(diffcommand, stderr=subprocess.PIPE, stdout=subprocess.PIPE)

# Calculate runtime
runtime = 0.0
for line in diff.stderr.readlines():
    current = line.split()
    if current:
        if current[0] == "user" or current[0] == "sys":
            runtime = runtime + float(current[1])
print "Runtime: "+str(runtime)

# Pipe to "sed"
sedresult = subprocess.check_output(sedcommand, stdin=diff.stdout)

 # Wait for the subprocesses to terminate
diff.wait()

However it feels like that this is not clean (especially from an OS point of view). It also leads to the script being stuck in the readlines part under certain circumstances I couldn't figure out yet.

Is there a cleaner (or better) way to achieve what I want?

Edit 1 Changed head line and gave a more detailed explanation

Edit 2 Thanks to J.F. Sebastian, I had a look at os.wait4(...) (information taken from his answer. But since I AM interested in the output, I had to implement it a bit different.

My code now looks like this:

diffprocess = subprocess.Popen(diffcommand,stdout=subprocess.PIPE)
runtimes = os.wait4(diffprocess.pid,0)[2]
runtime = runtimes.ru_utime + runtimes.ru_stime
diffresult = diffprocess.communicate()[0]

Note that I do not pipe the result to sed any more (decided to trim within python)

The runtime measurement works fine for some test cases, but the execution gets stuck sometimes. Removing the runtime measurement then helps the program to terminate and so does sending stdout to DEVNULL (as demanded here). Could I have a deadlock? (valgrind --tool=helgrind did not find anything) Is there something fundamentally wrong in my approach?

Community
  • 1
  • 1
Paddre
  • 798
  • 1
  • 9
  • 19
  • 1
    Try [`timeit`](https://docs.python.org/3/library/timeit.html). –  Mar 10 '15 at 08:03
  • Are you aware of [`difflib`](https://docs.python.org/2/library/difflib.html) and [`filecmp`](https://docs.python.org/2/library/filecmp.html#module-filecmp)? – Peter Wood Mar 10 '15 at 08:23
  • @LutzHorn: Since I am interested in the "real" runtime (without waiting for time slots etc.) I think this might not be accurate enough. – Paddre Mar 10 '15 at 08:41
  • @PeterWood: Yes, but I am measuring the runtime of different diff-tools under specific circumstances. The above is part of a whole evaluation framework I have written – Paddre Mar 10 '15 at 08:42
  • Your evaulatio framework is using GNU time called from Python? –  Mar 10 '15 at 09:00
  • The runtime measurement is just a small part Of the evaluation. That's why I want to stick to Python. However, like I said before, don't know a better way to get the most accurate runtime. If `timeit` or any other library helps me with it , I'm glad to use it ;-) – Paddre Mar 10 '15 at 10:00
  • see how [`os.wait4()` is used to get sys,user time](http://stackoverflow.com/a/28521323/4279) – jfs Mar 10 '15 at 14:30
  • Thanks for the hint. I applied it to my example and the runtime output is similar to the one of `time`. However, in some cases it gets stuck (maybe a deadlock?). See my updated question – Paddre Mar 10 '15 at 22:06

1 Answers1

2

but the execution gets stuck sometimes.

If you use stdout=PIPE then something should read the output while the process is still running otherwise the child process will hang if its stdout OS pipe buffer fills up (~65K on my machine).

from subprocess import Popen, PIPE

p = Popen(diffcommand, stdout=PIPE, bufsize=-1)
with p.stdout:
    output = p.stdout.read()
ru = os.wait4(p.pid, 0)[2]
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • I thought about the same problem, but didn't know how to solve it. Works perfect for me :-)....but could you please explain what the `bufsize` parameter does? – Paddre Mar 12 '15 at 00:12
  • 1
    @Paddre: `bufsize` is passed as [the `buffering` parameter while creating file objects](https://docs.python.org/3/library/functions.html#open) for the pipes. On Python 2, `bufsize=0` that may negatively affect performance. On Python 3 (recent versions) the default is equivalent to `bufsize=-1`. There were intermediate Python 3 versions with `bufsize=0` by default that may lead to *short reads* on Python 3 (wrong result). – jfs Mar 12 '15 at 00:23