Why is memcpy() faster?

Question

I'm curious why the memcpy() function is faster than the simple manual copy.

Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main() 
{ 
    clock_t begin, end;
    double time_spent;
    int i, j;   
    char source[65536], destination[65536]; 

    begin = clock();

    for (j = 0; j<1000; j++) 
        for (i = 0; i < 65536; i++) destination[i] = source[i];
    //slower than memcpy(destination, source, 65536);

    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("%Lf\n",time_spent);
    system("pause");
}

Doesn't the implementation of memcpy() do the same thing? Thanks in advance.

What optimization flags did you compile with? – Jim Balter Mar 31 '13 at 00:08 — Jim Balter, Mar 31 '13 at 00:08

score 4 · Answer 1 · edited May 23 '17 at 11:53

4

memcpy() can incorporate various other optimizations, for example SIMD. See this answer for more information.

edited May 23 '17 at 11:53

Community

1
1

answered Mar 30 '13 at 22:31

Jorge Israel Peña

36,800
16
93
123

score 4 · Answer 2 · answered Mar 30 '13 at 22:32

A good optimizing compiler should identify that your loop is, in fact, memmove() or memcpy() and replace it with a call to that function. That still leaves the question: why is it smart to do that?

It turns out that there's a great deal of room for hand-optimization of the compiled code for copying memory, and compilers aren't nearly smart enough to do it all yet (it's also very cpu-specific, so OSs will have specialized versions for each family of CPUs they support, and swap them at runtime).

Here's OSX's x86_64 SSE 4.2 copy implementation: http://www.opensource.apple.com/source/Libc/Libc-825.25/x86_64/string/bcopy_sse42.s

score 4 · Answer 3 · answered Mar 30 '13 at 22:32

Isn't the implementation of memcpy() do the same thing?

Not necessarily.

It's a standard library function, and as such:

it may be highly optimized, using plaform-specific fast assembly instructions or maybe it just copies more than one bytes per iteration, which is faster if the processor has large enough registers;
it may be recognized by the compiler as a builtin, so it may perform even more optimization steps, for example, inlining it removing the function call overhead, or deducing from its context what you are trying to do and do it using another method, etc.

score 1 · Answer 4 · answered Mar 30 '13 at 22:32

Because the for loop copy the item one by one. While the memcpy() copy the items block by block. You could read the souce code of memcpy() here: https://www.student.cs.uwaterloo.ca/~cs350/common/os161-src-html/memcpy_8c-source.html or here http://research.microsoft.com/en-us/um/redmond/projects/invisible/src/crt/memcpy.c.htm

score 1 · Answer 5 · answered Mar 30 '13 at 22:32

1

memcpy() will try to copy words at once, i.e. 4 bytes per iteration on 32 bit systems and 8 bytes per iteration on 64 bit systems.

answered Mar 30 '13 at 22:32

Marcellus

1,277
8
7

score 0 · Answer 6 · answered Mar 30 '13 at 22:32

0

memcpy is not a vanilla loop. There are a number of optimizations in place.

Things like alignment and word-size allow memcpy to copy memory in bigger chunks, at a steady pace.

answered Mar 30 '13 at 22:32

salezica

74,081
25
105
166

score 0 · Answer 7 · answered Mar 30 '13 at 22:35

0

You can just step into memcpy to find out that it's not a simple loop.

answered Mar 30 '13 at 22:35

Paul

13,042
3
41
59

Why is memcpy() faster?

7 Answers7

Linked