0

I'm curious why the memcpy() function is faster than the simple manual copy.

Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main() 
{ 
    clock_t begin, end;
    double time_spent;
    int i, j;   
    char source[65536], destination[65536]; 

    begin = clock();

    for (j = 0; j<1000; j++) 
        for (i = 0; i < 65536; i++) destination[i] = source[i];
    //slower than memcpy(destination, source, 65536);

    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("%Lf\n",time_spent);
    system("pause");
} 

Doesn't the implementation of memcpy() do the same thing? Thanks in advance.

Johnny Mnemonic
  • 3,822
  • 5
  • 21
  • 33

7 Answers7

4

memcpy() can incorporate various other optimizations, for example SIMD. See this answer for more information.

Community
  • 1
  • 1
Jorge Israel Peña
  • 36,800
  • 16
  • 93
  • 123
4

A good optimizing compiler should identify that your loop is, in fact, memmove() or memcpy() and replace it with a call to that function. That still leaves the question: why is it smart to do that?

It turns out that there's a great deal of room for hand-optimization of the compiled code for copying memory, and compilers aren't nearly smart enough to do it all yet (it's also very cpu-specific, so OSs will have specialized versions for each family of CPUs they support, and swap them at runtime).

Here's OSX's x86_64 SSE 4.2 copy implementation: http://www.opensource.apple.com/source/Libc/Libc-825.25/x86_64/string/bcopy_sse42.s

Catfish_Man
  • 41,261
  • 11
  • 67
  • 84
4

Isn't the implementation of memcpy() do the same thing?

Not necessarily.

It's a standard library function, and as such:

  • it may be highly optimized, using plaform-specific fast assembly instructions or maybe it just copies more than one bytes per iteration, which is faster if the processor has large enough registers;
  • it may be recognized by the compiler as a builtin, so it may perform even more optimization steps, for example, inlining it removing the function call overhead, or deducing from its context what you are trying to do and do it using another method, etc.
1

Because the for loop copy the item one by one. While the memcpy() copy the items block by block. You could read the souce code of memcpy() here: https://www.student.cs.uwaterloo.ca/~cs350/common/os161-src-html/memcpy_8c-source.html or here http://research.microsoft.com/en-us/um/redmond/projects/invisible/src/crt/memcpy.c.htm

Sheng
  • 3,467
  • 1
  • 17
  • 21
1

memcpy() will try to copy words at once, i.e. 4 bytes per iteration on 32 bit systems and 8 bytes per iteration on 64 bit systems.

Marcellus
  • 1,277
  • 8
  • 7
0

memcpy is not a vanilla loop. There are a number of optimizations in place.

Things like alignment and word-size allow memcpy to copy memory in bigger chunks, at a steady pace.

salezica
  • 74,081
  • 25
  • 105
  • 166
0

You can just step into memcpy to find out that it's not a simple loop.

Paul
  • 13,042
  • 3
  • 41
  • 59