0

I am trying to insert multiple items in a hashtable and measure the insertion times in milisseconds. Basically, it works like this (this function belongs to the class of my hashtable):

double benchmark(int amountOfInsertions){
    int valueToInsert;
    timeval tv_timeStart, tv_timeEnd;
    double totalTime = 0;
    double db_timeStart, db_timeEnd;

    for (int i = 0; i < amountOfInsertions; i++){

        valueToInsert = generateRandomVariable();

        gettimeofday(&tv_timeStart, NULL);
        insert(valueToInsert);
        gettimeofday(&tv_timeEnd, NULL);

        db_timeStart = tv_timeStart.tv_sec*1000 + tv_timeStart.tv_usec/1000.0;
        db_timeEnd  = tv_timeEnd.tv_sec*1000  + tv_timeEnd.tv_usec/1000.0;

        totalTime += (db_timeEnd - db_timeStart);
    }

    return totalTime;
}

The problem is that the insertion times used to look like this, obviously showing a clear progression of times, the more items I inserted: enter image description here

But now, I notice that the insertion times kind of alternate between the same values (around multiples of 15.625), creating extremely inaccurate results: enter image description here

And it just started happening all of a sudden, even with old versions of my code that I know output correct times. Is it a particular problem with gettimeofday()? If not, what could it be?

This problem is so mysterious to me that even wonder if this is the right place of right way to ask about it.

UPDATE: I've also tried with clock() and std::chrono::steady_clock, as well as measuring the time of the whole loop instead of each individual insertion (example below), and still got the same behaviour:

double benchmark(int amountOfInsertions){
    int valueToInsert;
    double totalTime = 0;

    steady_clock::time_point t1 = steady_clock::now();
    for (int i = 0; i < amountOfInsertions; i++){

        valueToInsert = generateRandomVariable();
        insert(valueToInsert);

    }
    steady_clock::time_point t2 = steady_clock::now();

    duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
    totalTime = time_span.count()*1000;

    return totalTime;
}
  • Are those vertical scales the same? And could there be other processes hogging the CPU? – Beta May 01 '18 at 23:49
  • I was just coincidentally looking at [How to get microsecond timings in JavaScript since Spectre and Meltdown](//stackoverflow.com/q/50117537), but it's not plausible that `gettimeofday` was made less accurate as defence against timing side-channels for Spectre. Native code on x86 can use `rdtsc` directly so making `gettimeofday` worse doesn't help defend anything. What changed in the system you're running on? Kernel version? Hardware? **Are you sure the timing is wrong, rather than insert time itself getting some kind of scheduling / timing granularity?** – Peter Cordes May 02 '18 at 00:24
  • Nothing. It was just a matter of weeks between the tests I showed. – Daniel Franklin May 02 '18 at 00:28
  • What system are you timing on? Hardware / kernel version? (and is it a VM?) – Peter Cordes May 02 '18 at 02:27
  • 1
    Windows 10, not VM – Daniel Franklin May 02 '18 at 04:06
  • CPU: i5 4590, RAM: 8GB DDR3 – Daniel Franklin May 02 '18 at 06:02
  • `gettimeofday()` isn't a native Windows system call ([Equivalent of gettimeday() for Windows](//stackoverflow.com/q/10905892)). What implementation are you using, and what actual Windows system call does it use? – Peter Cordes May 02 '18 at 06:07

2 Answers2

4

I do not know what caused this sudden change in timer resolution for gettimeofday, but i understand that it should not be used to measure time anyway. Even the man page of gettimeofday says so.

Please use clock_gettime instead. Or if you can use fancy C++11 features: std::chrono::steady_clock

KompjoeFriek
  • 3,572
  • 1
  • 22
  • 35
  • 1
    Yeah, I've been thinking of changing the function as well. But the reason for the sudden change of behaviour is still a mystery, as the inaccurate resolution was supposed to be the expected behaviour the whole time, from what I found in this [link](https://stackoverflow.com/questions/13175573/why-is-microsecond-timestamp-is-repetetive-using-a-private-gettimeoftheday-i), – Daniel Franklin May 02 '18 at 00:33
  • Update: I've tried with clock() and steady_clock, and the same thing occurs. I thought it could've been because I was measuring the times of each individual insertion and then adding them to get the total, when it would be better to measure the running time of the whole loop, but even this didn't change anything. – Daniel Franklin May 02 '18 at 05:26
0

If you want to truly benchmark this you need to see which optimization flags are you using, is something optimized away, is something running in the background are the context switches from the hyperthreading affecting you and some more. The maybe use Celero or Hayai depends on how precise you need this. Then perform the test at least 5 times and play around with the sample count in the test.

I found out that std chrono is not the most reliable clock if you are benchmarking and trying to define a benchmarking test.

Marko Bencik
  • 368
  • 2
  • 13