Inconsistent results trying to microbenchmark

Question

I have been trying to take some microbenchmarks of Lua code, but have run into an incredibly annoying problem: I can't seem to get consistent results.

Example: here's a dead simple Lua program that's supposed to time a naive Fibonacci function:

function pfibs(n)
  if n ~= math.floor(n) then return 0
  elseif n < 0 then return pfibs(n + 2) - pfibs(n + 1)
  elseif n < 2 then return n
  else return pfibs(n - 1) + pfibs(n - 2)
  end
end
t = os.clock()
pfibs(30)
t = os.clock() - t
print("time: "..t)

When I try to run it several times in a row, something like this happens:

$ lua fib.lua
time: 1.265
$ lua fib.lua
time: 1.281
$ lua fib.lua
time: 1.343
$ lua fib.lua
time: 1.437
$ lua fib.lua
time: 1.562
$ lua fib.lua
time: 1.578
$ lua fib.lua
time: 1.64
$ lua fib.lua
time: 1.703
$ lua fib.lua
time: 1.75
$ lua fib.lua
time: 1.797
$ lua fib.lua
time: 1.796
$ lua fib.lua
time: 1.812
$ lua fib.lua
time: 1.89

(this is just an example, snipped for brevity, but representative of the sort of slowdown curve I'm seeing)

Times that start around 1.1 seconds end up well above two seconds. All I've been doing is sitting here pressing up-enter repeatedly. The same sort of thing happens if I wrap the test in a call to time instead of using the Lua clock, or if I make it loop a few times to take a couple more seconds; it seems to slow down proportionally. If I leave it for a while, sometimes the times go back down. Sometimes not (probably because I don't know how long to leave it).

This is on Windows+MSYS. Trying this on Ubuntu (same machine) lead to a different pattern but still wildly inconsistent and unusable results (e.g. a test would take 2 seconds, then 3.5 seconds, then 4 seconds, then 2.5 seconds...). Task Manager/top indicate nothing chewing up CPU in the background in either case. CPU speed switching is disabled.

What am I doing wrong? My machine is old, but it can't be that broken (surely I'd notice it being unusable if it were the machine's fault and every program got that much slower every second...).

What I Am Actually Trying To Do:

What I wanted to do was learn about interpreter implementation, by starting with vanilla Lua and tweaking it to see what effect changes had on interpreter performance. As you can see I have yet to get past "establishing a control", so I haven't actually done any of that bit yet - with benchmark variance as high as the above, any changes I make are going to be utterly lost in the noise. I chose Lua because although it's a real-world program, its also tiny and easy to read and modify. If there is a better base interpreter to do this with, or an established best way to benchmark interpreter performance, please feel free to add suggestions on that to answers.

Edit: Adding the C tag because the same thing is occurring in C programs using conventional C timing utilities as well, e.g.:

#include <time.h>
#include <stdio.h>

int fib(int n) {
    return n > 2 ? fib(n - 1) + fib(n - 2) : n;
}

int main(void) {
    clock_t t1, t2; const int ITNS = 30;
    for (int i = 0; i < ITNS; i ++) {
        t1 = clock();
        fib(38);
        t2 = clock();
        printf("time: %d\n", (int)((t2 - t1) / (CLOCKS_PER_SEC / 1000)));
    }
    return 0;
}

...prints the following:

time: 687
time: 688
time: 687
time: 688
time: 671
time: 688
time: 687
time: 688
time: 672
time: 687
time: 688
time: 687
time: 672
time: 688
time: 687
time: 688
time: 672
time: 796
time: 766
time: 719
time: 969
time: 1000
time: 1015
time: 1000
time: 1016
time: 1000
time: 1000
time: 1015
time: 1000
time: 1000

This indicates the effect is not limited to separate runs. I guess this means there's a problem with the machine or OS.

I think there's nothing to do with OS since the whole computation happens in user space. — Naruil, May 28 '13 at 01:28
I've test your c programs in my Intel and AMD servers and it's always stable. Can you post your machine/OS specs here? — Naruil, May 28 '13 at 01:38
One of the things you should do is change the governor setting from on-demand or powersave to performance. Also see [`governor.sh`](https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh) from the Crypto++ project, which borrowed it from Andy Polyakov of the OpenSSL project. — jww, Oct 29 '17 at 20:33

Naruil · Answer 1 · 2013-05-28T02:05:13.637

Your program seems to be very stable on my machine with Xeon E7-4850

time: 0.97
time: 0.98  
time: 1
time: 0.98
time: 1.01
time: 1
time: 0.98
time: 0.98
time: 1.02
time: 0.98

However, I suggest you to check if cpu-frequency-scaling or something like turbo boost is enabled when you run benchmarks. We have met similar problems before but when we turned off cpu-frequency-scaling the benchmarks become stable. On Linux you can use cpufrequtils to turn it off.

Moreover, if you are using an AMD machine, I suggest just simply switch to a intel one since even if cpu frequency is fixed, lua programs' performance are still not stable for our Opteron 8431. So this problem may dependent on hardware platform but not Lua interpreter itself.

Edit:

I think it's better to read and print current cpu frequency (from /proc/cpuinfo or /dev/cpu/#/msr) after each iteration to ensure that the frequency is stable.

Your result with the C program have two clear stable stages. It looks like something happened after run19 and cpu frequency scales down.

enter image description here

I disabled frequency scaling before posting the question; although I only thought to do it after I noticed the problem, it did not appear to have any effect at all on the timings. — Alex Celeste, May 27 '13 at 22:39

score 0 · Answer 2 · answered May 28 '13 at 19:44

Not sure to what extent this is technically a "solution", but:

I was able to make the problem stop manifesting visibly by turning off frequency scaling (as suggested), but locking the CPU speed to "always low" instead of "always high". Timings are now consistently within 1% of each other, just as on everyone else's machines.

Using CPU-Z, I'd noticed that the temperature also seemed to spike at the same time as the sudden result slowdown. It only jumped from 60 to 72 degrees (on a CPU that supposedly goes up to 100 degrees), but the correlation existed. With the frequency locked to low, this no longer happens and neither does the slowdown.

I'm thinking this problem can be put down to "my computer is old and unreliable", and the question should be closed because it's evidently not really a general programming problem at all.

Thanks for your help though.

No, this behaviour is expected. CPU will automaticaly throttle down in high temperature no matter if you have disabled frequency scaling. Sometimes you can even tune the threshold in BIOS configs. 100 degrees is the highest allowed temperature of your CPU and it will directly shutdown in this case. I suggest you clean the dust inside your machine and try again ^_^ — Naruil, May 29 '13 at 10:09

score -1 · Answer 3 · answered May 27 '13 at 09:58

-1

rdtsc-based time measurement is unaffected by CPU frequency changes.
You can try your benchmark in LuaJIT with redefined os.clock function in the following way
(just setup your CPU_speed constant for plausible time values to be displayed):

-- New os.clock() implementation based on rdtsc instruction, LuaJIT required
do 
   local CPU_speed = 3.0 -- Your CPU speed in GHz
   local rdtsc = require'ffi'.cast(
      '__cdecl uint64_t(*)()',   
      '\x0F\x31\xC3'  -- rdtsc, ret 
   )                  -- This trick may not work on modern 64-bit OS
   local rdtsc0 = rdtsc()
   os.clock = function()
      return tonumber(rdtsc() - rdtsc0)/(CPU_speed * 10^9)
   end
end

answered May 27 '13 at 09:58

Egor Skriptunoff

23,359
2
34
64

1

Tried with C rather than LuaJIT, but unfortunately `rdtsc` seems to show the same pattern. Wherever the problem is, it's somewhere else. – Alex Celeste May 27 '13 at 22:24
Problem seems to be at your OS side. Inspect CPU usage (if you have 2 or more CPU cores) and memory usage while testing. Probably previous process was not terminated prior to new process started? – Egor Skriptunoff May 27 '13 at 23:37
I have updated the question with an example that shows the effect also happens within the same program... and that I guess this means it's not a programmatic problem after all; the computer is probably overheating or something. – Alex Celeste May 28 '13 at 00:06
1

@AlexCeleste: CPUs since the mid 2000s have had the `constant_tsc` feature, where the TSC frequency is fixed, independent of the core clock frequency. This answer is wrong except for ancient CPUs like maybe Pentium III or maybe AMD K8. See [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/q/13772567) for more about TSC. – Peter Cordes Jun 02 '23 at 00:24

Inconsistent results trying to microbenchmark

3 Answers3