7

Suppose I have a code executed in Unix this way:

$ ./mycode

My question is is there a way I can time the running time of my code executed K times. The value of K = 1000 for example.

I am aware of Unix "time" command, but that only executed 1 instance.

Lance Roberts
  • 22,383
  • 32
  • 112
  • 130
neversaint
  • 60,904
  • 137
  • 310
  • 477

8 Answers8

15

to improve/clarify on Charlie's answer:

time (for i in $(seq 10000); do ./mycode; done)
Evan Teran
  • 87,561
  • 32
  • 179
  • 238
13

try

$ time ( your commands )

write a loop to go in the parens to repeat your command as needed.

Update

Okay, we can solve the command line too long issue. This is bash syntax, if you're using another shell you may have to use expr(1).

$ time (
> while ((n++ < 100)); do echo "n = $n"; done
> )

real    0m0.001s
user    0m0.000s
sys     0m0.000s
Community
  • 1
  • 1
Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
5

Just a word of advice: Make sure this "benchmark" comes close to your real usage of the executed program. If this is a short living process, there could be a significant overhead caused by the process creation alone. Don't assume that it's the same as implementing this as a loop within your program.

paprika
  • 2,424
  • 26
  • 46
  • 1
    +1. At the very least, please time a loop that runs a no-op program 1000 times just to get an idea of the overhead. This will still be an underestimate of the overhead because the real program will take longer for the dynamic linker to load and fix up. – j_random_hacker Feb 19 '09 at 12:07
4

To enhance a little bit some other responses, some of them (those based on seq) may cause a command line too long if you decide to test, say one million times. The following does not have this limitation

time ( a=0 ; while test $a -lt 10000 ; do echo $a ; a=`expr $a + 1` ; done)
Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87
2

Another solution to the "command line too long" problem is to use a C-style for loop within bash:

 $ for ((i=0;i<10;i++)); do echo $i; done

This works in zsh as well (though I bet zsh has some niftier way of using it, I'm just still new to zsh). I can't test others, as I've never used any other.

fow
  • 556
  • 3
  • 6
2

forget time, hyperfine will do exactly what you are looking for: https://github.com/sharkdp/hyperfine

% hyperfine 'sleep 0.3'
Benchmark 1: sleep 0.3
  Time (mean ± σ):     310.2 ms ±   3.4 ms    [User: 1.7 ms, System: 2.5 ms]
  Range (min … max):   305.6 ms … 315.2 ms    10 runs
minusf
  • 1,204
  • 10
  • 10
1

Linux perf stat has a -r repeat_count option. Its output only gives you the mean and standard deviation for each HW/software event, not min/max as well.

It doesn't discard the first run as a warm-up or anything either, but it's somewhat useful in a lot of cases.

Scroll to the right for the stddev results like ( +- 0.13% ) for cycles. Less variance in that than in task-clock, probably because CPU frequency was not fixed. (I intentionally picked a quite short run time, although with Skylake hardware P-state and EPP=performance, it should be ramping up to max turbo quite quickly even compared to a 34 ms run time. But for a CPU-bound task that's not memory-bound at all, its interpreter loop runs at a constant number of clock cycles per iteration, modulo only branch misprediction and interrupts. --all-user is counting CPU events like instructions and cycles only for user-space, not inside interrupt handlers and system calls / page-faults.)

$ perf stat --all-user -r5   awk 'BEGIN{for(i=0;i<1000000;i++){}}'

 Performance counter stats for 'awk BEGIN{for(i=0;i<1000000;i++){}}' (5 runs):

             34.10 msec task-clock                #    0.984 CPUs utilized            ( +-  0.40% )
                 0      context-switches          #    0.000 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
               178      page-faults               #    5.180 K/sec                    ( +-  0.42% )
       139,277,791      cycles                    #    4.053 GHz                      ( +-  0.13% )
       360,590,762      instructions              #    2.58  insn per cycle           ( +-  0.00% )
        97,439,689      branches                  #    2.835 G/sec                    ( +-  0.00% )
            16,416      branch-misses             #    0.02% of all branches          ( +-  8.14% )

          0.034664 +- 0.000143 seconds time elapsed  ( +-  0.41% )

awk here is just a busy-loop to give us something to measure. If you're using this to microbenchmark a loop or function, construct it to have minimal startup overhead as a fraction of total run time, so perf stat event counts for the whole run mostly reflect the code you wanted to time. Often this means building a repeat-loop into your own program, to loop over the initialized data multiple times.

See also Idiomatic way of performance evaluation? - timing very short things is hard due to measurement overhead. Carefully constructing a repeat loop that tells you something interesting about the throughput or latency of your code under test is important.


Run-to-run variation is often a thing, but often back-to-back runs like this will have less variation within the group than between runs separated by half a second to up-arrow/return. Perhaps something to do with transparent hugepage availability, or choice of alignment? Usually for small microbenchmarks, so not sensitive to the file getting evicted from the pagecache.

(The +- range printed by perf is just I think one standard deviation based on the small sample size, not the full range it saw.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

If you're worried about the overhead of constantly load and unloading the executable into process space, I suggest you set up a ram disk and time your app from there.

Back in the 70's we used to be able to set a "sticky" bit on the executable and have it remain in memory.. I don't know of a single unix which now supports this behaviour as it made updating applications a nightmare.... :o)

ScaryAardvark
  • 2,855
  • 4
  • 30
  • 43
  • 1
    once run the program will be cached, so the repeated overhead will be relocation which won't be helped by using a ram disk – Ronny Vindenes Feb 18 '09 at 08:01
  • 1
    Mmm... I'm afraid that the RAM disk will only help a little bit. The SO still has to load the program and set up all the complex structures that support it. – Diego Sevilla Feb 18 '09 at 08:02
  • Caching the program is only half the battle. If you don't have pre-linking enabled it needs to be linked at runtime and it may incur significant startup overhead compared to runtime if it needs to initialise its environment or load from a config file. – Adam Hawes Feb 18 '09 at 08:05
  • +1 Diego. Process startup time seems fast but it's an eternity compared to the 1 or 2 machine instructions it takes to perform the loop *inside* a single instance of the application. That applies even if the OS doesn't look at the disk. – j_random_hacker Feb 19 '09 at 12:12
  • The sticky bit, incidentally, is no longer supported because modern OSes are smart enough to cache libraries and applications automatically, without needing any hints like that. – bdonlan May 15 '09 at 15:46