How can one benchmark memcpy? I wrote test code, but it finishes immediately (probably, due to compiler optimization) and does not actually allocate memory:
void test(void)
{
const uint32_t size = 4000'000'000;
char a[size], b[size];
printf("start\n");
for(int i=0; i<10'000'000; i++)
memcpy(b, a, size*sizeof(char));
printf("end\n");
}// end of function
I want to know the cost of memcpy in terms of CPU time and in terms of wall time.
Here is the situation: I need to process incoming (through network) data at high rate. If I do not process it fast enough, the network buffers get overfilled and I am disconnected from the data source (which happens in my test code quite frequently). I can see that the CPU usage of my process is quite low (10-15%) and so there should be some operation that costs time without costing CPU time. And so, I want to estimate the contribution of memcpy operations to the wall time it takes to process one unit of data. The code is basically some computing and memory copy operations: there is no resource, which I need to wait for, that could slow me down.
Thank you for your help!
[EDIT:]
Thank you very much for your comments! And sorry for having an example which is not C (C++ only) - my priority was readability. Here is a new example of the code, which shows that memcpy is not free and consumes 100% of CPU time:
const uint32_t N = 1000'000'000;
char *a = new char[N],
*b = new char[N];
void test(void)
{
for(uint32_t i=0; i<N; i++)
a[i] = '7';
printf("start\n");
for(int i=0; i<100; i++)
memcpy(b, a, N*sizeof(char));
printf("end\n");
}// end of function
which makes me confused about why I have low CPU usage but do not process incoming data quickly enough.