I'm migrating my application from windows 7 to windows 10.
All functions were worked without any changes, but execution time was slower than windows 7.
It seems object construction/destruction was slow. Then I created simple
benchmark program regarding malloc() and free() such as below.
for (int i = 0; i < 100; i++)
{
QueryPerformanceCounter(&gStart);
p = malloc(size);
free(p);
QueryPerformanceCounter(&gEnd);
printf("%d, %g\n", i, gEnd.QuadPart-gStart.QuadPart);
if (p == NULL)
printf("ERROR\n", size);
}
I ran this program in both windows 7 and windows 10 on same PC.
I measured malloc() and free() performance when data size is 1, 100, 1000, 10000, 100000, 1000000, 10000000 and 100000000 bytes.
In all above cases, windows 10 is slower than windows 7.
Especially, windows 10 is slow more than tenfold windows 7 when data size is 10000000 and 100000000.
When data size is 10000000 bytes
- Windows 7 : 0.391392 msec
- Windows 10 : 4.254411 msec
When data size is 100000000 bytes
- Windows 7 : 0.602178 msec
- Windows 10 : 38.713946 msec
Do you have any suggestions to improve it on windows 10?
I've experimented with the followings in windows 10, but performance was not improved unfortunately.
- Disabled superfetch
- Disabled Ndu.sys
- Disk cleanup
Here is the source code. (updated Feb 15th)
#include "stdafx.h"
#define START_TIME QueryPerformanceCounter(&gStart);
#define END_TIME QueryPerformanceCounter(&gEnd);
#define PRT_FMT(fmt, ...) printf(fmt, __VA_ARGS__);
#define PRT_TITLE(fmt, ...) printf(fmt, __VA_ARGS__); gTotal.QuadPart = 0;
#define PRT_RESULT printf(",%d", gEnd.QuadPart-gStart.QuadPart); gTotal.QuadPart+=(gEnd.QuadPart-gStart.QuadPart);
#define PRT_END printf("\n");
//#define PRT_END printf(",total,%d,%d\n", gTotal.QuadPart, gTotal.QuadPart*1000000/gFreq.QuadPart);
LARGE_INTEGER gStart;
LARGE_INTEGER gEnd;
LARGE_INTEGER gTotal;
LARGE_INTEGER gFreq;
void
t_Empty()
{
PRT_TITLE("02_Empty");
START_TIME
END_TIME; PRT_RESULT
PRT_END
}
void
t_Sleep1234()
{
PRT_TITLE("01_Sleep1234");
START_TIME
Sleep(1234);
END_TIME; PRT_RESULT
PRT_END
}
void*
t_Malloc_Free(size_t size)
{
void* pVoid;
PRT_TITLE("Malloc_Free_%d", size);
for(int i=0; i<100; i++)
{
START_TIME
pVoid = malloc(size);
free(pVoid);
END_TIME; PRT_RESULT
if(pVoid == NULL)
{
PRT_FMT("ERROR size(%d)", size);
}
}
PRT_END
return pVoid;
}
int _tmain(int argc, _TCHAR* argv[])
{
int i;
QueryPerformanceFrequency(&gFreq);
PRT_FMT("00_QueryPerformanceFrequency, %lld\n", gFreq.QuadPart);
t_Empty();
t_Sleep1234();
for(i=0; i<10; i++)
{
t_Malloc_Free(1);
t_Malloc_Free(100);
t_Malloc_Free(1000); //1KB
t_Malloc_Free(10000);
t_Malloc_Free(100000);
t_Malloc_Free(1000000); //1MB
t_Malloc_Free(10000000); //10MB
t_Malloc_Free(100000000); //100MB
}
return 0;
}
Result in my environment (built by VS2010 and windows 7) In 100MB case :
QPC count in windows 7 : 11.52 (4.03usec)
QPC count in windows 10 : 973.28 (341msec)