I have the problem that there are quite often delays in the code execution which I cannot explain. With delays I mean that execution a piece of code which should need constant time, needs sometimes much more time.
I attached a small C program which does some "dummy" calculations on the CPU core 1. The thread is pinned to this core. I've executed it on a Ubuntu 18.04 machine with 192 GiB RAM and 96 CPU cores. This machine does nothing else.
The tool only runs one thread (the main thread is sleeping) and at least the perf
tool shows no switches (thread switches), so this should not be a problem.
The output of the tool looks like this (it is shown more or less every second):
...
Stats:
Max [us]: 883
Min [us]: 0
Avg [us]: 0.022393
...
These statistics always show the results for 1'000'000 runs. My question is why the maximum value is always that big? Also the 99.99%-quantiles are often huge (I did not add them to the example to make the code small; the max also shows this behavior pretty well). Why does this happen and how can I avoid it? In some applications this "variance" is quite a problem for me.
Given there is nothing else running, it is hard for me to understand these values.
Thank you very much
main.c:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdbool.h>
#include <sys/time.h>
#include <pthread.h>
#include <sys/sysinfo.h>
static inline unsigned long now_us()
{
struct timeval tx;
gettimeofday(&tx, NULL);
return tx.tv_sec * 1000000 + tx.tv_usec;
}
static inline int calculate(int x)
{
/* Do something "expensive" */
for (int i = 0; i < 1000; ++i) {
x = (~x * x + (1 - x)) ^ (13 * x);
x += 2;
}
return x;
}
static void *worker(void *arg)
{
(void)arg;
const int runs_per_measurement = 1000000;
int dummy = 0;
while (true) {
int max_us = -1;
int min_us = -1;
int sum_us = 0;
for (int i = 0; i < runs_per_measurement; ++i) {
const long start_us = now_us();
dummy = calculate(dummy);
const long runtime_us = now_us() - start_us;
/* Update stats */
if (max_us < runtime_us) {
max_us = runtime_us;
}
if (min_us < 0 || min_us > runtime_us) {
min_us = runtime_us;
}
sum_us += runtime_us;
}
printf("Stats:\n");
printf(" Max [us]: %d\n", max_us);
printf(" Min [us]: %d\n", min_us);
printf(" Avg [us]: %f\n", (double)sum_us / runs_per_measurement);
printf("\n");
}
return NULL;
}
int main()
{
pthread_t worker_thread;
if (pthread_create(&worker_thread, NULL, worker, NULL) != 0) {
printf("Cannot create thread!\n");
return 1;
}
/* Use CPU number 1 */
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
if (pthread_setaffinity_np(worker_thread, sizeof(cpuset), &cpuset) != 0) {
printf("Cannot set cpu core!\n");
return 1;
}
pthread_join(worker_thread, NULL);
return 0;
}
Makefile:
main: main.c
gcc -o $@ $^ -Ofast -lpthread -Wall -Wextra -Werror