3

Why does memcpy perform slower than memmove on my system?

From reading other SO questions such as this or this Gives the impression that memcpy should work faster than memmove, and intuitively, this should be so. After all, there are less checks that memcpy has and the man pages also match what they say.

However, when measuring the time spent inside of each function, memmove beats memcpy! What more, it seems to beat memset too, when memset seems like it could benefit from optimizations that memcpy or memmove can't. Why would this be so?

Results (one of many) on my computer:

[INFO] (ex23.c:151 func: main) Normal copy: 109092
[INFO] (ex23.c:198 func: main) memcpy: 66070
[INFO] (ex23.c:209 func: main) memmove: 53149
[INFO] (ex23.c:219 func: main) memset: 52451

Code used to give this result:

#include <stdio.h>
#include <string.h>
#include "dbg.h" // debugging macros
#include <time.h>

int main(int argc, char *argv[])
{
    char from[10000] = {'a'};
    char to[10000] = {'c'};
    int rc = 0;
    struct timespec before; 
    memset(from, 'x', 10000);
    memset(to, 'y', 10000);

    clock_gettime(CLOCK_REALTIME, &before);

    // naive assignment using a for loop
    normal_copy(from, to, 10000);
    struct timespec after;
    clock_gettime(CLOCK_REALTIME, &after);
    log_info("Normal copy: %ld", (after.tv_nsec - before.tv_nsec));


    memset(to, 'y', 10000);
    clock_gettime(CLOCK_REALTIME, &before); 
    memcpy(to, from, 10000);
    clock_gettime(CLOCK_REALTIME, &after);
    log_info("memcpy: %ld", (after.tv_nsec - before.tv_nsec));

    memset(to, 'y', 10000);
    clock_gettime(CLOCK_REALTIME, &before);
    memmove(to, from, 10000);
    clock_gettime(CLOCK_REALTIME, &after);
    log_info("memmove: %ld", (after.tv_nsec - before.tv_nsec));

    memset(to, 'y', 10000);
    clock_gettime(CLOCK_REALTIME, &before);
    memset(to, 'x', 10000);
    clock_gettime(CLOCK_REALTIME, &after);
    log_info("memset: %ld", (after.tv_nsec - before.tv_nsec));

    return 0;
}
Community
  • 1
  • 1
Micaiah Chang
  • 119
  • 2
  • 7
  • 1
    Did you try rearranging them? Or testing in separate programs? You could be seeing cache effects. 10000 bytes isn't very much data, either. – Carl Norum Aug 11 '13 at 21:29
  • 5
    It looks like you're only testing one run of each function. Try measuring the time for a million runs, and then calculating the average time for each. And, cache effects will definitely matter. – Greg Hewgill Aug 11 '13 at 21:30
  • 1
    In addition to repeating the test many times, I would also suggest copying more than 10,000 bytes of memory. Why not something like 100 million? That way you can be sure that cache is not affecting your results. Naturally, you'd want to allocate your memory on the heap, as the stack won't handle that amount of data. – paddy Aug 12 '13 at 03:51

1 Answers1

1

As @Carl Norum and @Greg Hewgill say: cache effects.

Your certainly experiencing the effects of cached memory. Re-order your tests and compare results. When I tested memcpy() before and after memmove(), the 2nd memcpy() performed like memove() and also was faster than the first memcpy().

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Ah, so just to be clear: When you test code, you want to run some high number of trials, so that the initial effects of storing things in cache are minimal. Switch ordering to see if it's a cache effect and iterate many times to minimize its influence. – Micaiah Chang Aug 11 '13 at 23:38
  • @Micaiah Chang, Agree about your suggestions. I tried trials ranging from 1 to 100,000,000, also various orders. Of course, various environments could have differing results. Bottom line, as long as no memory overlap, `memcpy()` and `memmove()` _should_ perform similarly. – chux - Reinstate Monica Aug 12 '13 at 00:24