0

Consider the following code:

#include <iostream>
#include <string>
#include <chrono>

using namespace std;

int main()
{
    int iter = 1000000;
    int loops = 10;
    while (loops)
    {
        int a=0, b=0, c=0, f = 0, m = 0, q = 0;
        auto begin = chrono::high_resolution_clock::now();
        auto end = chrono::high_resolution_clock::now();
        auto deltaT = end - begin;
        auto accumT = end - begin;
        accumT = accumT - accumT;
        auto controlT = accumT;
        srand(chrono::duration_cast<chrono::nanoseconds>(begin.time_since_epoch()).count());

        for (int i = 0; i < iter; i++) {
            begin = chrono::high_resolution_clock::now();

            // No arithmetic operation

            end = chrono::high_resolution_clock::now();
            deltaT = end - begin;
            accumT += deltaT;
        }

        controlT = accumT; // Control duration
        accumT = accumT - accumT; // Reset to zero

        for (int i = 0; i < iter; i++) {
            auto n1 = rand() % 100;
            auto n2 = rand() % 100;
            begin = chrono::high_resolution_clock::now();
            c += i*2*n1*n2; // Some arbitrary arithmetic operation
            end = chrono::high_resolution_clock::now();
            deltaT = end - begin;
            accumT += deltaT;
        }
        // Print the difference in time between loop with no arithmetic operation and loop with
        cout << " c = " << c << "\t\t" << " | ";
        cout << "difference between the 1st and 2nd loop: "
             << chrono::duration_cast<chrono::nanoseconds>(accumT - controlT).count()
             << endl;
        loops--;
    }
    return 0;
}

It tries to isolate the time measurement of an operation. The first loop is a control to establish a baseline and the second loop has an arbitrary arithmetic operation.

Then it outputs to the console. Here's sample output:

 c =  2116663282   | difference between 1st and 2nd loop: -8620916
 c =   112424882   | difference between 1st and 2nd loop: -1197927
 c = -1569775878   | difference between 1st and 2nd loop: -5226990
 c =  1670984684   | difference between 1st and 2nd loop:  4394706
 c = -1608171014   | difference between 1st and 2nd loop:   676683
 c = -1684897180   | difference between 1st and 2nd loop:  2868093
 c =   112418158   | difference between 1st and 2nd loop:  5846887
 c =  2019014070   | difference between 1st and 2nd loop:  -951609
 c =   656490372   | difference between 1st and 2nd loop:   997815
 c =   263579698   | difference between 1st and 2nd loop:  2371088

Here's the very interesting part: sometime the loop with the arithmetic operation finishes faster than the loop with no arithmetic operation (negative difference). Which means that the operation to record the current time is slower than the arithmetic operation, and thus not negligible.

Is there a way around this?

PS: Yes, I understand you can wrap the whole loop between begin and end.

Setup machine: Core i7 architecture, Windows 10 64 bit, and Visual Studio 2015

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Adl A
  • 183
  • 1
  • 8
  • Which compiler, and are your enabling optimizations of any kind? – Tommy Andersen Mar 02 '16 at 09:55
  • Sorry, forgot to add that. v140 for vs2015. standard optimization. – Adl A Mar 02 '16 at 10:01
  • This just shows that such measurements are meaningless. Running times of modern CPU aren't deterministic and you must average over a sufficient number of instructions. It is also likely that you are more measuring the time for the call of the `now` function than that of the targeted operation. –  Mar 02 '16 at 10:09
  • To add to what @YvesDaoust has said, try to see what happens if you comment line `c += i*2*n1*n2; //some arbitrary arithmetic operation`. – vcp Mar 02 '16 at 10:15
  • @YvesDaoust very true. And that is problematic. My interest in this question is more academic - in other words, i'm trying to understand the nature of computation with c++ - but yes i agree, macrobenchmarking is more relevant. – Adl A Mar 02 '16 at 10:26
  • @vcp i had already done that before even asking the question. the conclusion is the same. chronometric measurements are not negligible. – Adl A Mar 02 '16 at 10:28
  • 2
    @AdlA: you can get better understanding by looking at the generated assembly code (together with the Intel Instruction Set manuals). From my experience, sub-millisecond timings aren't reliable. –  Mar 02 '16 at 10:31
  • Microbenchmarks are hard to do on modern CPUs with modern optimizing compilers. Check out this [talk from CppCon 2015 by a clang developer](https://www.youtube.com/watch?v=nXaxk27zwlk). You did manage to do something that doesn't get optimized away (using `rand`, and printing the total), but the overhead of a `RDTSC` instruction is higher than the cost of the insns between the start and stop, so jitter will be a huge factor. (esp. for a bare `rdtsc` without a serializing insn). Timing multiple iterations of a loop is the only sane way to measure something that small. – Peter Cordes Mar 03 '16 at 04:27

2 Answers2

2

Your problem is that you measure time and not the number of instructions processed. Time can be influenced by a lot of things that are not really what you would expect, or wish to measure.

Instead, you should measure the number of clock cycles. There exists a library for this which can be found on Agner Fog's website. He has a lot of useful information about optimizations:

http://www.agner.org/optimize/#manuals

Even using clock cycles, you can still experience peculiarities in the results. This could happen if the processor uses out-of-order execution which enables the processor to optimize the order of execution of the operations.

If you have compiled your code with debugging symbols, the compiler may have injected additional code, which may impact the result. When performing tests like this, you should always compile without debugging information.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Tommy Andersen
  • 7,165
  • 1
  • 31
  • 50
0

You should use a steady clock, std::steady_clock.

The std::system_clock/std::high_resolution_clock is getting corrected by the OS.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jonezq
  • 344
  • 1
  • 5
  • It appears that VS 2015, has actually typedef'ed `high_resolution_clock` to `steady_clock` as per the comment by Mark A. Ropper to this question: http://stackoverflow.com/questions/16299029/resolution-of-stdchronohigh-resolution-clock-doesnt-correspond-to-measureme?rq=1 – Tommy Andersen Mar 02 '16 at 11:43