8

I'm profiling a program on Linux, using the "time" command. The problem is it's output is not very statistically relevant as it does only run the program once. Is there a tool or a way to get an average of several "time" runs? Possibly aswel together with statistical information such as deviation?

Rubens
  • 14,478
  • 11
  • 63
  • 92
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • 2
    You can get extra information from `time`, running `time -v ${program}`. Still, it does not produce *statiscally relevant* data, but you may run it in a `for` loop, and parse the results. Other than this, you may use profilers like `gprof` and [others](http://stackoverflow.com/questions/1777556/alternatives-to-gprof). – Rubens Jul 11 '13 at 19:21
  • Do you have the source code? If you do, take the body of `main` and wrap it in a loop to execute it 1000 times. Then divide the time by 1000. – Mike Dunlavey Jul 11 '13 at 23:40

4 Answers4

9

Here is a script I wrote to do something similar to what you are looking for. It runs the provided command 10 times, logging the real, user CPU and system CPU times to a file, and echoing tham after each command output. It then uses awk to provide averages of each of the 3 columns in the file, but does not (yet) include standard deviation.

#!/bin/bash

rm -f /tmp/mtime.$$

for x in {1..10}
do
  /usr/bin/time -f "real %e user %U sys %S" -a -o /tmp/mtime.$$ $@
  tail -1 /tmp/mtime.$$
done

awk '{ et += $2; ut += $4; st += $6; count++ } END {  printf "Average:\nreal %.3f user %.3f sys %.3f\n", et/count, ut/count, st/count }' /tmp/mtime.$$
bobmcn
  • 1,027
  • 9
  • 23
  • Thanks, this was nice -- just needed it. I just added ```NSAMPLES=${NSAMPLES-3}``` and changed looping part to `for x in $(seq 1 $NSAMPLES)` so that I can pass NSAMPLES env var to control number of runs. – EdvardM Sep 21 '17 at 13:04
  • Beware your locale settings. awk will parse numbers based on your `LC_NUMERIC` environment variable. – mbarkhau Aug 27 '20 at 19:42
  • 1
    I just found the [hyperfine](https://github.com/sharkdp/hyperfine) command that has many more features than my simple script. I was able to install it with my package manager, but there is complete documentation, including install instructions, at the link. – bobmcn Nov 22 '20 at 17:34
  • 1
    A recent video from CppCon 2022 on Observability Tools briefly mentioned another option, [multitime](https://github.com/ltratt/multitime). All 3 options (perf, hyperfine, and multitime) are preferable to my script, for reasons pointed out by the above commenters. – bobmcn Apr 06 '23 at 20:18
6

Use hyperfine.

For example:

hyperfine 'sleep 0.3'

Will run the command sleep 0.3 multiple times, then output something like this:

hyperfine 'sleep 0.3'
Benchmark #1: sleep 0.3
  Time (mean ± σ):     306.7 ms ±   3.0 ms    [User: 2.8 ms, System: 3.5 ms]
  Range (min … max):   301.0 ms … 310.9 ms    10 runs
Jay Sullivan
  • 17,332
  • 11
  • 62
  • 86
2

perf stat does this for you with the -r (-repeat=<n>) option, with average and variance.

e.g. using a short loop in awk to simulate some work, short enough that CPU frequency ramp-up and other startup overhead might be a factor (Idiomatic way of performance evaluation?), although it seems my CPU ramped up to 3.9GHz pretty quickly, averaging 3.82 GHz.

$ perf stat -r5 awk 'BEGIN{for(i=0;i<1000000;i++){}}'

 Performance counter stats for 'awk BEGIN{for(i=0;i<1000000;i++){}}' (5 runs):

             37.90 msec task-clock                #    0.968 CPUs utilized            ( +-  2.18% )
                 1      context-switches          #   31.662 /sec                     ( +-100.00% )
                 0      cpu-migrations            #    0.000 /sec                   
               181      page-faults               #    4.776 K/sec                    ( +-  0.39% )
       144,802,875      cycles                    #    3.821 GHz                      ( +-  0.23% )
       343,697,186      instructions              #    2.37  insn per cycle           ( +-  0.05% )
        93,854,279      branches                  #    2.476 G/sec                    ( +-  0.04% )
            29,245      branch-misses             #    0.03% of all branches          ( +- 12.79% )

           0.03917 +- 0.00182 seconds time elapsed  ( +-  4.63% )

(Scroll to the right for variance.)

You can use taskset -c3 perf stat ... to pin the task to a specific core (#3 in that case) if you have a single-threaded task and want to minimize context-switches.

By default, perf stat uses hardware perf counters to profile things like instructions, core clock cycles (not the same thing as time on modern CPUs), and branch misses. This has pretty low overhead, especially with the counters in "counting" mode instead of perf record causing interrupts to statistically sample hot spots for events.

You could use -e task-clock to just use that event without using HW perf counters. (Or if your system is in a VM, or you didn't change the default /proc/sys/kernel/perf_event_paranoid, perf might not be able to ask the kernel to program any anyway.)

For more about perf, see


For programs that print output, it looks like this:

$ perf stat -r5 echo hello
hello
hello
hello
hello
hello

 Performance counter stats for 'echo hello' (5 runs):

              0.27 msec task-clock                #    0.302 CPUs utilized            ( +-  4.51% )
...
          0.000890 +- 0.000411 seconds time elapsed  ( +- 46.21% )

For a single run, (the default with no -r), perf stat will show time elapsed, and user / sys. But -r doesn't average those, for some reason.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
-1

Like the commenter above mentioned, it sounds like you may want to use a loop to run your program multiple times, to get more data points. You can use the time command with the -o option to output the results of the time command to a text file, like so: time -o output.txt myprog

mti2935
  • 11,465
  • 3
  • 29
  • 33