1

I'm trying to measure a function's performance by measuring the time for each iteration. During the process, I found even if I do nothing, the results still vary quite a bit.

e.g.

volatile long count = 0;
for (int i = 0; i < N; ++i) {
    measure.begin();
    ++count;
    measure.end();
}

In measure.end(), I measure the time difference and keep an unordered_map to keep track of the time-count. I've used clock_gettime as well as rdtsc, but there's always about 1% of the data points lie far away from mean, in a 1000 factor.

Here's what the above loop generates:

T:  count   percentile
18  117563  11.7563%
19  111821  22.9384%
21  201605  43.0989%
22  541095  97.2084%
23  2136    97.422%
24  2783    97.7003%
...
406 1   99.9994%
3678    1   99.9995%
6662    1   99.9996%
17945   1   99.9997%
18148   1   99.9998%
18181   1   99.9999%
22800   1   100%

mean:21

So whether it's ticks or ns, the worst case 22800 is about 1000 times bigger than mean.

I did isolcpus in grub and was running this with taskset. The simple loop almost does nothing, the hash table to do time-count statistics is outside of the time measurements.

What am I missing?

I'm running this on a laptop with ubuntu installed, CPU is Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz

Ming
  • 365
  • 2
  • 12
  • I missed the part of the question where you mentioned isolcpus and taskset on my first read-through of the question, so I removed my answer. I'm not sure what would be causing variations in your measured times, other than your process getting blocked, waiting on another process running on another processor. And that other process runs at the mercy of the scheduler, even if yours doesn't. – Eric Finn Nov 20 '13 at 03:46
  • Yeah, and this app is only incrementing a volatile variable, I don't see why it'll wait on other process. – Ming Nov 20 '13 at 03:51
  • No, it's also calling `clock_gettime` and `rdtsc`. This may be one of those situations where observing something changes it. I'd still suggest moving the timing out of the loop and finding the average time per loop iteration, to see how much the means differ. – Eric Finn Nov 20 '13 at 03:54
  • How do I see jitter outside of the loop? I'll just get one mean value and not individual sample points, right? – Ming Nov 20 '13 at 04:06
  • 1
    It is very hard to isolate a CPU completely for just one process on Linux. See http://stackoverflow.com/questions/13583146/. (Bottom line: Kernel tasks and interrupt handlers can still execute on "your" CPU.) – Nemo Nov 20 '13 at 04:28
  • I looked at the interrupts and it seems the core I'm running on only gets local timer interrupts. I saw there's a call "local_irq_disable" which would disable interrupt temporarily. I might try that. But I don't seem to have asm/switch_to.h or asm/system.h on my Ubuntu 12.04. I'll dig some more... – Ming Nov 20 '13 at 04:55
  • It's pretty easy to screw up use of RDTSC (e.g. not using CPUID to flush the execution pipeline) - you should be showing us your code. – Tony Delroy Nov 20 '13 at 07:21
  • @Ming Correct. However, you wouldn't be looking for jitter. If the mean time is significantly less for the version where you time the entire loop rather than each iteration, then the code that does the timing must (at least on average) take a significant amount of time. Of course, the meaning of "significant" in this context is up to you. – Eric Finn Nov 20 '13 at 12:52
  • 1
    One more possibly relevant document: https://docs.google.com/file/d/0B6HTUUWSPdd-Zl93MVhlMnRJRjg/edit Supposed to be "coming soon" to a kernel release near you. – Nemo Nov 26 '13 at 00:30

1 Answers1

1

Thank you for all the answers. The main interrupt that I couldn't stop is the local timer interrupt. And it seems new 3.10 kernel would support tickless. I'll try that one.

Ming
  • 365
  • 2
  • 12
  • You can also try running under a realtime scheduler, `chrt -r 1 ./yourprogram` which normally would reduce the jitter a bit. If you really need low jitter, you might need a realtime OS though. – nos Nov 21 '13 at 22:40
  • That didn't help unfortunately. I tried to run "chrt -r 1 taskset -c 3 ./myapp". It shows up as priority -2 as expected and running 100% cpu, but the jitter is still the same factor and for some reason the app runs 4 times slower. – Ming Nov 21 '13 at 23:18