memmove vs copying backwards

Question

I understand that memmove in C (cstring library) handles overlaps nicely "at the cost of slower runtime" (see this post). I was wondering why this additional runtime cost? It seems to me that any overlap problem could be fixed by copying backwards instead of forward, am I wrong?

As a toy example, here are two versions of a "right-shift" function, that shifts the contents of an array by one element on the right:

// Using memmove
template <typename T>
void shift_right( T *data, unsigned n )
{
    if (n)
    {
        data[n-1].~T();
        memmove( data+1, data, (n-1)*sizeof(T) );
        new (data) T();
    }
}

// Using copy_backward
template <typename Iterator>
void shift_right( Iterator first, Iterator last )
{
    Iterator it = last;
    std::copy_backward( first, --it, last );
}

Are they equivalent? Performance-wise, which one is best to use?

Note: judging by the comment of @DieterLücking, and despite the precautions taken, the above version using memmove is unsafe in this situation.

Why don't you run the code and see for yourself? memmove is likely much more optimized than any generic algorithm. — Asik, Mar 03 '14 at 21:43
It seems to me that the moment you use the word Iterator, the answer to which performs faster is "whichever is not the iterator". Snark aside, memcpy() can be optimized for the platform it's being run on, often giving performance that exceeds simple field-by-field copying. memmove() can be similarly optimized with a little extra runtime work due to the overlap points. — mah, Mar 03 '14 at 21:43
Look carefully at the order in which memory is copied in `shift_right`: You copy the objects backward, but each object internally is probably copied forward. So the bytes are copied in the order 8,9,6,7,4,5,2,3,0,1 (for example). This does not map to an optimized CPU instruction sequence. — Raymond Chen, Mar 03 '14 at 21:48
Also, note that in general, C++'s `std::copy` will cause copy constructors to run, which C doesn't know about. These constructors may be non-trivial. — Bill Lynch, Mar 03 '14 at 21:50
@RaymondChen Thanks a lot for our comment, it is really helpful :) — Jonathan H, Mar 03 '14 at 21:51
No - the first will mess up objects (not using copy/move/destruct), while the second will behave well. (For POD it should have the same performance) — , Mar 03 '14 at 21:53
@DieterLücking I am glad that you bring this up; I think there is no risk to move things around, unless objects refer to each other inside the array, am I wrong? — Jonathan H, Mar 03 '14 at 21:56
@sharth `struct X { char small[4]; char* buffer; X() : buffer(small) {} };` and memmove will be fatal. — , Mar 03 '14 at 22:08

Stephen Canon · Accepted Answer · 2014-03-03T21:50:07.840

7

Assuming a good implementation, the only "extra cost" of memmove is the initial check (an add and a compare-and-branch) to decide whether to copy front-to-back or back-to-front. This cost is so completely negligible (the add and compare will be hidden by ILP and the branch is perfectly predictable under normal circumstances) that on some platforms, memcpy is just an alias of memmove.

In anticipation of your next question ("if memcpy isn't significantly faster than memmove, why does it exist?"), there are a few good reasons to keep memcpy around. The best one, to my mind, is that some CPUs essentially implement memcpy as a single instruction (rep/movs on x86, for example). These HW implementations often have a preferred (fast) direction of operation (or they may only support copying in one direction). A compiler may freely replace memcpy with the fastest instruction sequence without worrying about these details; it cannot do the same for memmove.

edited Mar 03 '14 at 21:50

answered Mar 03 '14 at 21:44

Stephen Canon

103,815
19
183
269

You're correct. Copying back to front can be slower depending on your cache architecture though. – StilesCrisis Mar 03 '14 at 21:48
Thanks, this explains better what `memmove` does under the hood – Jonathan H Mar 03 '14 at 21:49
"Perfectly Predictable" I suppose you profiled it? Because it depends on the algorithm. – Zan Lynx Mar 03 '14 at 21:49
1

@ZanLynx: Extensively, yes, but one can also reason from first principles to reach the same conclusion. First, overlapping buffers are rare in real usage, so simply statically predicting "no overlap" gets very high accuracy. Second, when overlap does happen, it's almost always static in nature (someone doing something like `memmove(a, a+1, len)`); even the simplest branch predictors do quite well on these cases. – Stephen Canon Mar 03 '14 at 22:26
I would expect a GOOD compiler to identify situations where `memmove` can be replaced with `memcpy` (modern compilers have "library call replacement" steps during optimisation, which does things like translate suitable complex functions with simpler ones - the obvious one is `printf("Hello, World!\n");` becoming `puts("Hello, World!");`, but also `memmove` would be a potential target for this sort of code. Obviously, if the compiler either can't determine or determines that the copy has to go "backwards", then it will call (or inline the equivalent code) `memmove`. – Mats Petersson Mar 03 '14 at 22:33
@MatsPetersson: avoiding making a function call at all is often preferable to transforming one function call to another, but yes, that's another approach that a compiler could take. – Stephen Canon Mar 03 '14 at 22:37
@StephenCanon: In code I've worked on, the memmove branch was not predictable because the branch was in the library. Therefore it was different with every call because some calls moved forward to open a gap in a sorted array while others deleted by moving backward. If you inlined every call it might become predictable. But it wasn't. – Zan Lynx Mar 04 '14 at 00:44
@ZanLynx: Most "recent" branch predictors (on non-tiny processors) include some amount of global state, which allows them to predict such branches with high accuracy even when they are in a library. On very limited processors, this is not always the case, but I assume that people doing micro-optimization for such platforms are capable of working out the details. Developers writing desktop/ios/android/whatever apps can reasonably not worry about it. – Stephen Canon Mar 04 '14 at 00:50

score 2 · Answer 2 · answered Mar 03 '14 at 21:46

Memmove figures out for you whether to copy backward or forward; it is also highly optimized for this task (i.e. copying in SSE-optimized blocks as much as feasible).

It is unlikely that you can do any better by calling any generic STL algorithm (the best they could do is to call memcopy or memmove behind the scenes), but of course you could answer this question simply by running your code and timing it.

bolov · Answer 3 · 2014-03-03T22:20:16.703

2

from the post you actually linked (emphasis mine):

memcpy just loops, while memmove performs a test to determine which direction to loop in to avoid corrupting the data. These implementations are rather simple. Most high-performance implementations are more complicated (involving copying word-size blocks at a time rather than bytes).

edited Mar 03 '14 at 22:20

answered Mar 03 '14 at 21:47

bolov

72,283
15
145
224

score 2 · Answer 4 · 2014-03-03T22:58:37.043

2

The appropriate ways to copy or move are std::copy, std::copy_n, std::copy_backward and std::move. A proper C++ library will use memcpy or memmove if applicable. Hence there is no need to go for undefined results if the copied or moved sequence holds no trivial data.

Note: Here the std::move is the template 'OutputIterator move(InputIterator first, InputIterator last, OutputIterator result);' (For @Void)

edited Mar 03 '14 at 22:58

answered Mar 03 '14 at 22:22

Thanks a lot, I edited the OP to include what I think to be a "safe" version using `memmove`. Ultimately though, I think you are right; the performance gain, if any, is not worth the risk of screwing the contents of complicated objects at a low level; let's rely on the STL instead of playing around with dangerous tools to gain peanuts. – Jonathan H Mar 03 '14 at 22:31
1

It might be to good to clarify that you're referring to the [`std::move()`](http://en.cppreference.com/w/cpp/algorithm/move) STL algorithm, not the [`std::move()`](http://en.cppreference.com/w/cpp/utility/move) C++11 utility function. I did a [double take](http://en.wiktionary.org/wiki/double_take) when I first saw that. :) – Void - Othman Mar 03 '14 at 22:52

memmove vs copying backwards

4 Answers4