I was experimenting with different integer types in Visual Studio project in Windows using a simple exchange sort algorithm below. The processor is Intel. The code was compiled in Release x64. The optimization setting is "Maximize Speed (/O2)". The command line corresponding to the compilation settings is
/permissive- /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Fp"x64\Release\SpeedTestForIntegerTypes.pch" /diagnostics:classic
The code itself:
#include <ctime>
#include <vector>
#include <iostream>
void sort(int N, int A[], int WorkArray[]) // exchange sort
{
int i, j, index, val_min;
for (j = 0; j < N; j++)
{
val_min = 500000;
for (i = j; i < N; i++)
{
if (A[i] < val_min)
{
val_min = A[i];
index = i;
}
}
WorkArray[j] = A[j];
A[j] = val_min;
A[index] = WorkArray[j];
}
}
int main()
{
std::vector<int> A(400000), WorkArray(400000);
for(size_t k = 0; k < 400000; k++)
A[k] = 400000 - (k+1);
clock_t begin = clock();
sort(400000, &A[0], &WorkArray[0]);
clock_t end = clock();
double sortTime = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Sort time: " << sortTime << std::endl;
return 0;
}
The WorkArray
is only needed to save the vector before sorting.
The point is, this sorting took me 22.3 seconds to complete. The interesting part is that if I change type int
to size_t
for arrays A
, WorkArray
(both in std::vector
and in the argument list of function sort
), as well as for val_min
, the time increases to 67.4! This is threefold slower! The new code is below:
#include <ctime>
#include <vector>
#include <iostream>
void sort(int N, size_t A[], size_t WorkArray[]) // exchange sort
{
int i, j, index;
size_t val_min;
for (j = 0; j < N; j++)
{
val_min = 500000U;
for (i = j; i < N; i++)
{
if (A[i] < val_min)
{
val_min = A[i];
index = i;
}
}
WorkArray[j] = A[j];
A[j] = val_min;
A[index] = WorkArray[j];
}
}
int main()
{
std::vector<size_t> A(400000), WorkArray(400000);
for(size_t k = 0; k < 400000; k++)
A[k] = 400000 - (k+1);
clock_t begin = clock();
sort(400000, &A[0], &WorkArray[0]);
clock_t end = clock();
double sortTime = double(end - begin) / CLOCKS_PER_SEC;
std::cout << "Sort time: " << sortTime << std::endl;
return 0;
}
Note that I still keep type int
for function local variables i
, j
, index
, N
, and so the only two arithmetical operations that are i++
and j++
should take the same amount of time to perform in both cases. Therefore, this slowdown has to do with other reasons. Is it related to the memory alignment issue or register sizes or something else?
But the most outrageous part was when I changed int
to unsigned int
. Both unsigned int
and int
occupy the same number of bytes which is 4 (sizeof
showed that). But the runtime for unsigned int
was 65.8 s! While the first outcome was somewhat ok to accept, the second one totally confuses me! Why is there such a significant difference in time it takes to run such a simple algorithm that does not even involve sign checks?
Thanks to all addressing both of these questions. Where can I start reading more about these hardware-level optimization peculiarities? I don't care about the sorting algorithm itself, it's here for illustration of the problem only.
UPDATE: once again, I stress the fact that I use ints for array indices in all three cases.