0

The below example callocs some memory in a simple, but not entirely trivial memory structure. It then either sets some values explicitly to zero or not, controlled by EXPLICIT_RESET being 1 or 0. It then adds up the values. The sum always comes out to be zero as expected.

The interesting thing is the timing. The code looks longer than it really is because of the #ifdefs to get timing with both Microsoft's CL compiler v17.00 and with gcc 4.8.3 on Cygwin.

cl /O2 /DEXPLICIT_RESET=0 test.c /Fetest.exe
./test.exe

Typical timing values on my system are 20000-25000.

cl /O2 /DEXPLICIT_RESET=1 test.c /Fetest.exe
./test.exe

Typical timing values 2500-3000.

gcc -O3 -DEXPLICIT_RESET=0 test.c -o test
./test.exe

Typical timing values 8000-10000. (These are not comparable, as I skipped a scaling factor that should be applied with Win32 code.)

gcc -O3 -DEXPLICIT_RESET=1 test.c -o test
./test.exe

Typical timing value 1000.

You can play around with simplifying the data structures further, but they are already close to the limit that still provokes the described behavior.

Can I get the fast access without the explicit initialization?

In my real program, it takes much too long to reset the memory explicitly, so I want to rely on calloc. But of course I want the accessed memory to be fast. So at the moment I'm stuck: Either slow initialization on top of calloc, or slow access.

#include <stdio.h>
#include <stdlib.h>

#ifdef _WIN32
  #include <windows.h>
  #include <time.h>
  LARGE_INTEGER tu0, tu1;
#else
  #include <sys/time.h>
  struct timeval tu0, tu1;
#endif

struct dataType
{
  int dummy1[4];
  int dummy2;
};

struct blockType
{
  int value;
  struct dataType list[16];
};

struct pageType
{
  struct blockType * list;
};


#define BLOCKS_PER_PAGE 100000

int main(int argc, char * argv[])
{
  int timing, j, sum = 0;

  struct pageType * pagep = (struct pageType *)
    calloc(1, sizeof(struct pageType));

  pagep->list = (struct blockType *)
    calloc(BLOCKS_PER_PAGE, sizeof(struct blockType));

#if EXPLICIT_RESET
  for (j = 0; j < BLOCKS_PER_PAGE; j++)
    pagep->list[j].value = 0;
#endif

#ifdef _WIN32
  QueryPerformanceCounter(&tu0);
#else
  gettimeofday(&tu0, NULL);
#endif

  for (j = 0; j < BLOCKS_PER_PAGE; j++)
    sum += pagep->list[j].value;

#ifdef _WIN32
  QueryPerformanceCounter(&tu1);
  timing = tu1.QuadPart - tu0.QuadPart;
#else
  gettimeofday(&tu1, NULL);
  timing = 1000000 * (tu1.tv_sec  - tu0.tv_sec )
         +           (tu1.tv_usec - tu0.tv_usec);
#endif

  printf("sum is %d, timing is %d\n", sum, timing);
  exit(0);
}
  • 1
    Your question can use some clarification. Are you asking why using `calloc()` alone results in slow access times, but using `calloc()` + manual zeroing results in slow setup, but fast access? – Mysticial Aug 10 '14 at 10:33
  • you should inspect the assembly, but it is for example possible that the compiler eliminates the second loop with EXPLICIT_RESET true, because it figured out the sum is 0. Also, are you sure you want to avoid this? The output is after all correct, and it's faster :] – stijn Aug 10 '14 at 10:34
  • 7
    You are making the standard mistake of ignoring that your code runs on a demand-page virtual memory operating system. Same answer as [this one](http://stackoverflow.com/a/24058364/17034). – Hans Passant Aug 10 '14 at 10:34
  • 1
    Related: http://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc – Mysticial Aug 10 '14 at 10:36

0 Answers0