Determining CPU time required to execute loop

Question

I've done some SO searching and found this and that outlining timing methods.

My problem is that I need to determine the CPU time (in milliseconds) required to execute the following loop:

for (int i = 0, temp = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}

I've looked at two methods, clock() and stead_clock::now(). Per the docs, I know that clock() returns "ticks" so I can get it in seconds by dividing the difference using CLOCKS_PER_SEC. The docs also mention that steady_clock is designed for interval timing, but you have to call duration_cast<milliseconds> to change its unit.

What I've done to time the two (since doing both in the same run may lead to one taking longer since the other was called first) is run them each by themselves:

clock_t t = clock();
for (int i = 0, temp = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
t = clock() - t;
cout << (float(t)/CLOCKS_PER_SEC) * 1000 << "ms taken" << endl;

chrono::steady_clock::time_point p1 = chrono::steady_clock::now();
for (int i = 0, temp = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
chrono::steady_clock::time_point p2 = chrono::steady_clock::now();
cout << chrono::duration_cast<milliseconds>(p2-p1).count() << "ms taken" << endl;

Output:

0ms taken
0ms taken

Do both these methods floor the result? Surely some fractal of milliseconds took place?

So which is ideal (or rather, more appropriate) for determining the CPU time required to execute the loop? At first glance, I would argue for clock() since the docs specifically tell me that its for determining CPU time.

For context, my CLOCKS_PER_SEC holds a value of 1000.

Edit/Update:

Tried the following:

clock_t t = clock();
for (int j = 0; j < 1000000; j++) {
    volatile int temp = 0;
    for (int i = 0; i < 10000; i++)
    {
        if (i % 2 == 0)
        {
            temp = (i / 2) + 1;
        }
        else
        {
            temp = 2 * i;
        }
    }
}
t = clock() - t;
cout << (float(t) * 1000.0f / CLOCKS_PER_SEC / 1000000.0f) << "ms taken" << endl;

Outputs: 0.019953ms taken

clock_t start = clock();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
clock_t end = clock();
cout << fixed << setprecision(2) << 1000.0 * (end - start) / CLOCKS_PER_SEC << "ms taken" << endl;

Outputs: 0.00ms taken

chrono::high_resolution_clock::time_point p1 = chrono::high_resolution_clock::now();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
chrono::high_resolution_clock::time_point p2 = chrono::high_resolution_clock::now();
cout << (chrono::duration_cast<chrono::microseconds>(p2 - p1).count()) / 1000.0 << "ms taken" << endl;

Outputs: 0.072ms taken

chrono::steady_clock::time_point p1 = chrono::steady_clock::now();
volatile int temp = 0;
for (int i = 0; i < 10000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
chrono::steady_clock::time_point p2 = chrono::steady_clock::now();
cout << (chrono::duration_cast<chrono::microseconds>(p2 - p1).count()) / 1000.0f << "ms taken" << endl;

Outputs: 0.044ms

So the question becomes, which is valid? The second method to me seems invalid because I think the loop is completing faster than a millisecond.

I understand the first method (simply to execute longer) but the last two methods produce drastically different results.

One thing I've noticed is that after compiling the program, the first time I run it I may get 0.073ms (for the high_resolution_clock) and 0.044ms (for the steady_clock) at first, but all subsequent runs are within the range of 0.019 - 0.025ms.

Aside: since this code is silly/pointless (it keeps overwriting the value of `temp`, which eventually is thrown away), the compiler might choose to eliminate all or part of its output. Anyway, 10.000 iterations isn't very much for such a simple piece of code - try upping the loop count by several orders of magnitude. — 500 - Internal Server Error, Jul 10 '18 at 14:34
Why do you think that it takes any time to do nothing? Your loops have no effect on the observable behaviour of your program — Caleth, Jul 10 '18 at 14:35
@500-InternalServerError Did that and they do in fact produce results. I had the same thought, but these are my constraints. Please see my response to Caleth. — pstatix, Jul 10 '18 at 14:36
@Caleth The time it takes to execute this loop is later assigned as a metric of 1 UT later in the program for objects. So I need to get `1 UT = x ms` where `x` is that unknown time in milliseconds to run that loop. — pstatix, Jul 10 '18 at 14:37
Even if the compiler doesn't optimize everything out, `0ms` might be the correct result. For example `2 * i` is the same as `i + i` which takes a fraction of a nanosecond on a PC. — Bo Persson, Jul 10 '18 at 14:37
Note that you are doing integer arithmetic on (apparently millisecond) values when you `clock() - t`, so if the *count of milliseconds* hasn't gone up, 0ms **is** the correct result — Caleth, Jul 10 '18 at 14:39
Check this [Google benchmark](https://github.com/google/benchmark) — Victor Gubin, Jul 10 '18 at 17:24
@VictorGubin Looking to implement without 3rd party packages, thanks though — pstatix, Jul 10 '18 at 17:25

Patrik · Answer 1 · 2018-07-10T15:00:37.093

2

You can do the loop a million times, and divide. You can also add the volatile keyword to avoid some compiler optimizations.

clock_t t = clock();
for (int j = 0, j < 1000000; j++) {
    volatile int temp = 0;
    for (int i = 0; i < 10000; i++)
    {
        if (i % 2 == 0)
        {
            temp = (i / 2) + 1;
        }
        else
        {
            temp = 2 * i;
        }
    }
}
t = clock() - t;
cout << (float(t) * 1000.0f / CLOCKS_PER_SEC / 1000000.0f) << "ms taken" << endl;

edited Jul 10 '18 at 15:00

answered Jul 10 '18 at 14:54

Patrik

2,695
1
21
36

At this point my question changes to: Is there any difference between using your method over a `chrono::high_resolution_clock` to measure this? I like that yours uses the `clock()` which specifically measures CPU time, and the other method actually measures wall time. Other than that the differences (output wise) are marginal at best. – pstatix Jul 10 '18 at 17:18
Also, turns out that on Windows [clock](https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2012/4e2ess30(v=vs.110)) measures wall time which would explain near identical times for it and `chrono::high_resolution_clock`. So I suppose what would be the portable variant across Windows and Linux? – pstatix Jul 10 '18 at 18:06
Interesting comment about windows, I didn't know that. The std clock is defined to return the CPU time used by the process, and can differ from the wall time, so that would be the most correct (but not so on windows then). The chrono gives the walltime, so that can turn out to be longer than the actual running time you want to know. You can maybe get more accurate results by increasing your thread/process priority, but it's no guarantee the CPU will not be used for another process. – Patrik Jul 11 '18 at 18:25
So, in conclusion, if you want to have an "execution time unit", I would go for the walltime, and take the average over multiple loops and multiple runs. – Patrik Jul 11 '18 at 18:27

score 0 · Answer 2 · answered Jul 10 '18 at 14:39

0

Well using GetTickCount() seems to be solution, I hope

double start_s = GetTickCount();
for (int i = 0, temp = 0; i < 10000000; i++)
{
    if (i % 2 == 0)
    {
        temp = (i / 2) + 1;
    }
    else
    {
        temp = 2 * i;
    }
}
double stop_s = GetTickCount();
cout << (stop_s - start_s) / double(CLOCKS_PER_SEC) * 1000 << "ms taken" << endl;

For me returns between 16-31ms

answered Jul 10 '18 at 14:39

JonZ

139
2
17

that's mainly down to the precision of `GetTickCount` – Caleth Jul 10 '18 at 14:40

Determining CPU time required to execute loop

2 Answers2