0

I have a script (it's a hadoop pig script to b precice) from which I want to measure the execution time. I might planned to run the tests several times and take the average / median time as the execution time.

As starting the script several times by hand might get quite cumbersome, I wanted to write a script for running these tests.

Is it a good idea to use Python's Popen to start a new process that executes the script and measure the time that process is running, say:

# start timer
p = subprocess.Popen(...)
stdout, stderr = p.communicate()
# end timer

Does creating a new process skew the results in time measurements, or is this apporoach fine? Any other suggestions?

Best, Will

Will
  • 2,858
  • 6
  • 33
  • 50
  • 2
    Instead of the while-loop, simply use `p.communicate()` to wait for the process to finish. (Or `p.wait()` if you do not redirect the output.) – Sven Marnach Apr 28 '11 at 14:25
  • thanks, and what if my script produces a lot of output? does p.communicate buffer the whole output until it prints it? is there a way to print partial output / flush the buffer regularly? – Will Apr 28 '11 at 14:33
  • 1
    Alternatively, on Linux you can use the time command to benchmark how much a process has been running. For example, "time myprog" will run myprog and tell you at the end how much time it took. – sashoalm Apr 28 '11 at 14:35
  • check [my answer here for how to incrementally read stdout/stdin](http://stackoverflow.com/questions/4984549/merge-and-sync-stdout-and-stderr/5188359#5188359), might be useful if you need to do this in Python. – samplebias Apr 28 '11 at 14:39

1 Answers1

2

I'd also just use the time utility:

time python foo.py

You can then make a bash script to run this multiple times, recording the time taken for each run. Then, just average them with a shell utility or a Python script.

Blender
  • 289,723
  • 53
  • 439
  • 496