2

I can't seem to get a simple program (with lots of memory access) to achieve consistent timing in Linux. I'm using a 2.6 kernel, and the program is being run on a dual-core processor with realtime priority. I'm trying to disable cache effects by declaring the memory arrays as volatile. Below are the results and the program. What are some possible sources of the outliers?

Results:

Number of trials: 100
Range: 0.021732s to 0.085596s
Average Time: 0.058094s
Standard Deviation: 0.006944s
Extreme Outliers (2 SDs away from mean): 7
Average Time, excluding extreme outliers: 0.059273s

Program:

#include <stdio.h>
#include <stdlib.h> 
#include <math.h>

#include <sched.h>
#include <sys/time.h>

#define NUM_POINTS 5000000
#define REPS 100

unsigned long long getTimestamp() {
  unsigned long long usecCount;
  struct timeval timeVal;
  gettimeofday(&timeVal, 0);
  usecCount = timeVal.tv_sec * (unsigned long long) 1000000;
  usecCount += timeVal.tv_usec;
  return (usecCount);
}

double convertTimestampToSecs(unsigned long long timestamp) {
  return (timestamp / (double) 1000000);
}

int main(int argc, char* argv[]) {
  unsigned long long start, stop;
  double times[REPS];
  double sum = 0;
  double scale, avg, newavg, median;
  double stddev = 0;
  double maxval = -1.0, minval = 1000000.0;
  int i, j, freq, count;
  int outliers = 0;
  struct sched_param sparam;

  sched_getparam(getpid(), &sparam);
  sparam.sched_priority = sched_get_priority_max(SCHED_FIFO);
  sched_setscheduler(getpid(), SCHED_FIFO, &sparam);

  volatile float* data;
  volatile float* results;

  data = calloc(NUM_POINTS, sizeof(float)); 
  results = calloc(NUM_POINTS, sizeof(float)); 

  for (i = 0; i < REPS; ++i) {
    start = getTimestamp();
    for (j = 0; j < NUM_POINTS; ++j) {
      results[j] = data[j];
    }
    stop = getTimestamp();
    times[i] = convertTimestampToSecs(stop-start);
  }

  free(data);
  free(results);

  for (i = 0; i < REPS; i++) {
    sum += times[i];

    if (times[i] > maxval)
      maxval = times[i];

    if (times[i] < minval)
      minval = times[i];
  }
  avg = sum/REPS;

  for (i = 0; i < REPS; i++)
    stddev += (times[i] - avg)*(times[i] - avg);
  stddev /= REPS;
  stddev = sqrt(stddev);

  for (i = 0; i < REPS; i++) {
    if (times[i] > avg + 2*stddev || times[i] < avg - 2*stddev) {
      sum -= times[i];
      outliers++;
    }
  }
  newavg = sum/(REPS-outliers);

  printf("Number of trials: %d\n", REPS);
  printf("Range: %fs to %fs\n", minval, maxval);
  printf("Average Time: %fs\n", avg);
  printf("Standard Deviation: %fs\n", stddev);
  printf("Extreme Outliers (2 SDs away from mean): %d\n", outliers);
  printf("Average Time, excluding extreme outliers: %fs\n", newavg);

  return 0;
}
Jim Hunziker
  • 14,111
  • 8
  • 58
  • 64
  • Are you running as root `(otherwise you won't be able to use SCHED_FIFO). Also, volatile doesn't go around CPU cache effects. See also http://stackoverflow.com/questions/88/ – nos Apr 05 '10 at 18:31
  • Yup - I'm running this as root. Also, I don't see what part of that article talks about the cache. – Jim Hunziker Apr 05 '10 at 18:52
  • It talks about gettimeofday resolution, not the caches :) – nos Apr 06 '10 at 17:54

1 Answers1

3

Make sure you have no other processes taking CPU time. Watch out in particular for screen savers and anything which regularly does GUI updates (e.g. a clock or similar). Try setting CPU affinity for your benchmark process to lock it onto one core (e.g. taskset from the command line). Make your benchmark process if not paging - typically you want to have an outer loop which runs N times and then time the last N-1 executions.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • I'm pretty sure that setting the process to realtime priority protects me from all those GUI things. Audio is sometimes also RT, but I made sure there wasn't anything of that type running. I did restrict it to one core as you said. As you can see from the results, there are 7 outliers, so excluding the first won't do the trick. The outliers are evenly distributed across the trials. – Jim Hunziker Apr 05 '10 at 19:24
  • @Jim: I wouldn't be too sure about the realtime priority - try getting rid of the the GUI completely and just run from a console (`sudo init 1`) and use `taskset` to ensure CPU affinity. – Paul R Apr 05 '10 at 19:48
  • `init 1` did the trick. It's good to know that there's GUI stuff that can mess up an RT process. I don't quite understand what's going on there, though. The only RT processes on my system are `migration/0`, `migration/1`, `watchdog/0`, and `watchdog/1`. Thanks! – Jim Hunziker Apr 05 '10 at 20:14
  • @Jim: Cool ! I wish everything were that easy to fix. ;-) It's possible that when you have your GUI enabled that there is some kind of 60 Hz interrupt driven process going on which perhaps doesn't show up in something like `top`, but which is significant enough to perturb your timing. Usually all interrupts are handled by the first core on the system (0) so using `taskset` to run your code on the second core (1) might also be a solution. – Paul R Apr 05 '10 at 20:37
  • Linux is not a hard realtime system, meaning there's still things that can preempt your process even when it has realtime priority,in particular irq handlers. – nos Apr 06 '10 at 17:56
  • Isn't there also something that has to be done in BIOS settings? If I understand correctly, the modern Intel CPUs adjust their frequency on-the-fly. It is reasonable to assume that these adjustments may affect the timings. I would very much appreciate if someone can shed light on this side of the issue. – AlwaysLearning Jan 12 '15 at 20:45