I went on to test memcpy behavior on my system after seeing this Why does the speed of memcpy() drop dramatically every 4KB?
Details of my system:
arun@arun-OptiPlex-9010:~/mem_copy_test$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Stepping: 9
CPU MHz: 1600.000
BogoMIPS: 6784.45
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
arun@arun-OptiPlex-9010:~/mem_copy_test$ cat /proc/cpuinfo | grep 'model name'| head -1
model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
arun@arun-OptiPlex-9010:~/mem_copy_test$ uname -a
Linux arun-OptiPlex-9010 3.13.0-40-generic #69-Ubuntu
SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Test program:
#include <stdio.h>
#include <sys/time.h>
#include <stdlib.h>
#include <string.h>
void memcpy_speed(unsigned long buf_size, unsigned long iters)
{
struct timeval start, end;
unsigned char * pbuff_1;
unsigned char * pbuff_2;
int i;
pbuff_1 = (void *)malloc(buf_size);
pbuff_2 = (void *)malloc(buf_size);
gettimeofday(&start, NULL);
for(i = 0; i < iters; ++i){
memcpy(pbuff_2, pbuff_1, buf_size);
}
gettimeofday(&end, NULL);
printf("%5.3f\n", ((buf_size*iters)/(1.024*1.024))/((end.tv_sec - \
start.tv_sec)*1000*1000+(end.tv_usec - start.tv_usec)));
free(pbuff_1);
free(pbuff_2);
}
main()
{
unsigned long buf_size;
unsigned int i;
buf_size = 1;
for (i = 1; i < 16385 ; i++) {
printf("bufsize in kb=%d speed=", i);
buf_size = i * 1024;
memcpy_speed(buf_size, 10000);
printf("\n");
}
}
I am sharing the output from my google drive as stackoverflow is not allowing me to post images(says 10 reps needed for that)
Output for 1 to 256 KB:https://drive.google.com/file/d/0B3mnbsS6F4tpY2dhRWJLaEY1RWc/view?usp=sharing
output for 1 to 16384 KB:https://drive.google.com/file/d/0B3mnbsS6F4tpeC1Dd2R1VnJOV2c/view?usp=sharing
1) Why the graph has a peak @11-13KB?
2) why behavior from 20 to 129KB9(range1) and 130 to 256KB(range2) are different?(range1 has max speed not at multiples of 4 but range2 has max speed at multiples of 4; that too with large peaks; also range2 has better speed than range1 at multiples of 4)
3) Why the speed reduces dramatically close to 3000KB?
--Arun