I have simple C code that does this (pseudo code):
#define N 100000000
int *DataSrc = (int *) malloc(N);
int *DataDest = (int *) malloc(N);
memset(DataSrc, 0, N);
for (int i = 0 ; i < 4 ; i++) {
StartTimer();
memcpy(DataDest, DataSrc, N);
StopTimer();
}
printf("%d\n", DataDest[RandomInteger]);
My PC: Intel Core i7-3930, with 4x4GB DDR3 1600 memory running RedHat 6.1 64-bit.
The first memcpy()
occurs at 1.9 GB/sec, while the next three occur at 6.2 GB/s.
The buffer size (N
) is too big for this to be caused by cache effects. So, my first Question:
- Why is the first
memcpy()
so much slower? Maybemalloc()
doesn't fully allocate the memory until you use it?
If I eliminate the memset()
, then the first memcpy()
runs at about 1.5 GB/sec,
but the next three run at 11.8 GB/sec. Almost 2x speedup. My second question:
- Why is
memcpy()
2x faster if I don't callmemset()
?