The below example callocs some memory in a simple, but not entirely trivial memory structure. It then either sets some values explicitly to zero or not, controlled by EXPLICIT_RESET being 1 or 0. It then adds up the values. The sum always comes out to be zero as expected.
The interesting thing is the timing. The code looks longer than it really is because of the #ifdefs to get timing with both Microsoft's CL compiler v17.00 and with gcc 4.8.3 on Cygwin.
cl /O2 /DEXPLICIT_RESET=0 test.c /Fetest.exe
./test.exe
Typical timing values on my system are 20000-25000.
cl /O2 /DEXPLICIT_RESET=1 test.c /Fetest.exe
./test.exe
Typical timing values 2500-3000.
gcc -O3 -DEXPLICIT_RESET=0 test.c -o test
./test.exe
Typical timing values 8000-10000. (These are not comparable, as I skipped a scaling factor that should be applied with Win32 code.)
gcc -O3 -DEXPLICIT_RESET=1 test.c -o test
./test.exe
Typical timing value 1000.
You can play around with simplifying the data structures further, but they are already close to the limit that still provokes the described behavior.
Can I get the fast access without the explicit initialization?
In my real program, it takes much too long to reset the memory explicitly, so I want to rely on calloc. But of course I want the accessed memory to be fast. So at the moment I'm stuck: Either slow initialization on top of calloc, or slow access.
#include <stdio.h>
#include <stdlib.h>
#ifdef _WIN32
#include <windows.h>
#include <time.h>
LARGE_INTEGER tu0, tu1;
#else
#include <sys/time.h>
struct timeval tu0, tu1;
#endif
struct dataType
{
int dummy1[4];
int dummy2;
};
struct blockType
{
int value;
struct dataType list[16];
};
struct pageType
{
struct blockType * list;
};
#define BLOCKS_PER_PAGE 100000
int main(int argc, char * argv[])
{
int timing, j, sum = 0;
struct pageType * pagep = (struct pageType *)
calloc(1, sizeof(struct pageType));
pagep->list = (struct blockType *)
calloc(BLOCKS_PER_PAGE, sizeof(struct blockType));
#if EXPLICIT_RESET
for (j = 0; j < BLOCKS_PER_PAGE; j++)
pagep->list[j].value = 0;
#endif
#ifdef _WIN32
QueryPerformanceCounter(&tu0);
#else
gettimeofday(&tu0, NULL);
#endif
for (j = 0; j < BLOCKS_PER_PAGE; j++)
sum += pagep->list[j].value;
#ifdef _WIN32
QueryPerformanceCounter(&tu1);
timing = tu1.QuadPart - tu0.QuadPart;
#else
gettimeofday(&tu1, NULL);
timing = 1000000 * (tu1.tv_sec - tu0.tv_sec )
+ (tu1.tv_usec - tu0.tv_usec);
#endif
printf("sum is %d, timing is %d\n", sum, timing);
exit(0);
}