I am doing a test on measuring the cost of one function call via pointer, here is my code. However, I found something very weird and look for your help.
The code is compiled under Release mode of VS2017, using default configuration.
There are 4 testbeds and all their OS are Win10. Here are some detailed information:
- M1: CPU: i7-7700, Microarchitecture: Kaby Lake
- M2: CPU: i7-7700, Microarchitecture: Kaby Lake
- M3: CPU: i7-4790, Microarchitecture: Haswell
- M4: CPU: E5-2698 v3, Microarchitecture: Haswell
In figures below, the legends are in the form machine parameter_order alias
. machine
is listed above. parameter_order
describes the order of LOOP
passed to the program during a single run. alias
indicates of which part the time is. no-exec
means no function call part, aka. Line 98—108. exec
means calling function part, aka. Line 115—125. per-exec
is the cost of a function call. All time units are millisecond. per-exec
refer to left y-axis, while others refer to right y-axis.
Comparing Fig.1—Fig.4, you could see that the graph may relate to CPU’s microarchitecture (M1’s and M2’s are similar, M3’s and M4’s are similar).
My questions:
- Why do all machines have two stage (
LOOP < 25
andLOOP > 100
)? - Why do all no-exec time have a weird peak when
32 <= LOOP <= 41
? - Why do no-exec time and exec time of Kaby Lake machines (M1 and M2) have a discontinuous interval when
72 <= LOOP <= 94
? - Why does M4 (Server processor) have a larger variance compared to M3 (Desktop processor)?
Here are my test results:
For convenience, I also paste code here:
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cassert>
#include <algorithm>
#include <windows.h>
using namespace std;
const int PMAX = 11000000, ITER = 60000, RULE = 10000;
//const int LOOP = 10;
int func1(int a, int b, int c, int d, int e)
{
return 0;
}
int func2(int a, int b, int c, int d, int e)
{
return 0;
}
int func3(int a, int b, int c, int d, int e)
{
return 0;
}
int func4(int a, int b, int c, int d, int e)
{
return 0;
}
int func5(int a, int b, int c, int d, int e)
{
return 0;
}
int func6(int a, int b, int c, int d, int e)
{
return 0;
}
int (*init[6])(int, int, int, int, int) = {
func1,
func2,
func3,
func4,
func5,
func6
};
int (*pool[PMAX])(int, int, int, int, int);
LARGE_INTEGER freq;
void getTime(LARGE_INTEGER *res)
{
QueryPerformanceCounter(res);
}
double delta(LARGE_INTEGER begin_time, LARGE_INTEGER end_time)
{
return (end_time.QuadPart - begin_time.QuadPart) * 1000.0 / freq.QuadPart;
}
int main()
{
char path[100], tmp[100];
FILE *fin, *fout;
int cnt = 0;
int i, j, t, r;
int ans;
int LOOP;
LARGE_INTEGER begin_time, end_time;
double d1, d2, res;
for(i = 0;i < PMAX;i += 1)
pool[i] = init[i % 6];
QueryPerformanceFrequency(&freq);
printf("file path:");
scanf("%s", path);
fin = fopen(path, "r");
start:
if (fscanf(fin, "%d", &LOOP) == EOF)
goto end;
ans = 0;
getTime(&begin_time);
for(r = 0;r < RULE;r += 1)
{
for(t = 0;t < ITER;t += 1)
{
//ans ^= (pool[t])(0, 0, 0, 0, 0);
ans ^= pool[0](0, 0, 0, 0, 0);
ans = 0;
for(j = 0;j < LOOP;j += 1)
ans ^= j;
}
}
getTime(&end_time);
printf("%.10f\n", d1 = delta(begin_time, end_time));
printf("ans:%d\n", ans);
ans = 0;
getTime(&begin_time);
for(r = 0;r < RULE;r += 1)
{
for(t = 0;t < ITER;t += 1)
{
ans ^= (pool[t])(0, 0, 0, 0, 0);
ans ^= pool[0](0, 0, 0, 0, 0);
ans = 0;
for(j = 0;j < LOOP;j += 1)
ans ^= j;
}
}
getTime(&end_time);
printf("%.10f\n", d2 = delta(begin_time, end_time));
printf("ans:%d\n", ans);
printf("%.10f\n", res = (d2 - d1) / (1.0 * RULE * ITER));
sprintf(tmp, "%d.txt", cnt++);
fout = fopen(tmp, "w");
fprintf(fout, "%d,%.10f,%.10f,%.10f\n", LOOP, d1, d2, res);
fclose(fout);
goto start;
end:
fclose(fin);
system("pause");
exit(0);
}