Time an external program whose output is being processed by Python

Question

I want to measure the time of execution of an external program whose output is used by my Python script.

Calling extprogram the program that produced the output, at the moment I do something like:

import time
import subprocess

def process_output(line):
   ...
   ...
   return processed_data

all_processed_data = []

ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)

for line in p.stdout:
    all_processed_data.append(process_output(line))
te = time.time()
elapsed_time = te - ts

This doesn't work as intended because what I am measuring is the time of execution of extprogram plus the time required to process its output.

extprogram produces a large amount of data, therefore I would like to "stream" its output in my Python program using a cycle as I am doing now. How can I evaluate te when extprogram terminates rather than waiting for all the output to be processed?

Well, the normal usage of time is `time extprogram args`, and time outputs to stderr. I'm guessing you could do something like `p = subprocess.Popen(["time","extprogram"],stdout = subprocess.PIPE,stderr=subprocess.PIPE)` — Samizdis, Apr 03 '13 at 13:22
On using time: http://stackoverflow.com/questions/9006596/is-the-unix-time-command-accurate-enough-for-benchmarks — Samizdis, Apr 03 '13 at 13:30

StuGrey · Answer 1 · 2013-04-03T16:37:50.127

The following still uses 'wall clock' time but may be an alternative to the use of host system time commands. The execution and the timing are split into separate threads and the timer can be stopped before any processing is carried out.

from multiprocessing import Event
import threading
import time
import subprocess

def timing(event):
    print "timer starts"
    ts = time.time()
    event.wait()
    te = time.time()
    elapsed_time = te - ts
    print "Elapsed Time " + str(elapsed_time)

def execution(event): 
    for i in range(0,1000):
        p = subprocess.Popen("ls", stdout=subprocess.PIPE)
    event.set()

if __name__ == '__main__':  
    event = Event()
    e = threading.Thread(target=execution, args=(event,))
    t = threading.Thread(target=timing, args=(event,))
    t.start()  
    e.start() 
    while not event.is_set():
        print "running..."
        time.sleep(1)

This gives me the following output:

timer starts
running...
running...
Elapsed Time 1.66236400604

Or you could split receiving the output of 'extprogram' from the processing of the output.

For example:

ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)

for line in p.stdout:
    tempdata.append(line)

te = time.time()
elapsed_time = te - ts

for line in tempdata:
    all_processed_data.append(process_output(line))

Thanks @StuGrey. Actually I can't. As I said, the external program generates a fair large amount of data and storing all its content in memory is not an option. Even if I could though, the way you propose measures the time of execution of the external program plus the time used to store its content in a list (ok it is probably fast, but I would like an unbiased measurement). — lucacerone, Apr 03 '13 at 13:08
Measuring execution time cannot generally be done with the "wall clock" time: other running processes can make the measured time much longer than the real running time. — Eric O. Lebigot, Apr 03 '13 at 13:27
@EOL thanks but for the moment wall clock is fine. You are right though, it would be good in python to have a function that reports the cpu time in a way similar to the time command in linux — lucacerone, Apr 04 '13 at 08:19

Eric O. Lebigot · Answer 2 · 2013-04-03T13:44:00.730

1

Since you are under Unix, you can use the time command. Here is the principle:

import sys
import subprocess

p = subprocess.Popen(["time", "ls"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in p.stdout:  # ls output
    sys.stdout.write(line)

time_output = p.stderr.readlines()

print "Stderr:", ''.join(time_output)

On my machine, this gives:

Stderr:         0.01 real         0.00 user         0.00 sys

The total processor time is the user + sys time (real is the wall clock time, which does not generally represent how much processor time the program used: for instance, with sleep 5, the real time is 5 seconds, while the user and sys times are 0).

This works because time outputs a detailed accounting of the real execution time (not simply the wall time, which depends on what other processes are running, etc.), and does so to the standard error output. You can parse the standard error and get the timing information.

This method might not be practical if you program outputs data to the standard error that might interfere with the parsing of the time command.

Also, I haven't checked that no deadlock can happen with the above code (I'm not sure what would happen if the program called prints a lot to the standard error: could the program block until the standard error buffer is read, which may not happen if the Python program is reading the standard output?). That said, if you know that the timed program has no or little data on its standard error, I believe that the code above will not deadlock.

edited Apr 03 '13 at 13:44

answered Apr 03 '13 at 13:26

Eric O. Lebigot

91,433
48
218
260

I see two complications. 1. In reality `extprogram` already takes as input the output of an other external application. 2. the output of times is redirected to stdout that interferes with the output of extprogram – lucacerone Apr 03 '13 at 13:33
The first point is not really a problem: you simply pipe to `time extprogram` instead of `extprogram` and the result is the same! As for your second point, in reality, the output of `time` is *not* redirected to the standard output, so there will be no interference (you can see this in the code: `time_output = p.stderr.readlines()`—not `p.stdout`). – Eric O. Lebigot Apr 03 '13 at 14:01
thanks, this solution might work in this case. But generally speaking isn't there any other way? I mean what if I want to use the program in Windows? Maybe using multithreading I could check p.poll() and only run the time function when it has finished? – lucacerone Apr 03 '13 at 14:13
I've added a multithreaded answer that should work cross platform. It is still wall clock though but much more flexible then my initial answer. – StuGrey Apr 03 '13 at 16:28
@LucaCerone: About Windows: there is only one simple way of measuring execution time with the standard Python modules (I am setting `cProfile` aside): through the `timeit` module. However, this module reports the total time spent in a Python function; the problem is that this time is not necessarily the processor time (for instance, for the `sleep 5` command, `timeit` reports 5 seconds of execution time). – Eric O. Lebigot Apr 04 '13 at 02:16

Time an external program whose output is being processed by Python

2 Answers2

Linked