Efficiency of std::copy vs memcpy

Question

How severe is the efficiency loss between using memcpy and std::copy?

I have a situation where the vector implementation on my system doesn't appear to use contiguous memory, which is making me have to std::copy its contents later on rather than doing memcpy(dest, &vec[0], size);. I'm not sure how badly this is likely to impact efficiency.

What implementation are you using? (C++03 guarantees contiguous storage). — R. Martinho Fernandes, Sep 02 '11 at 15:48
Doesn't the standard *require* vectors to use contiguous memory, so their addresses can be passed to functions that expect an array? — Frédéric Hamidi, Sep 02 '11 at 15:48
If your vector's data isn't contiguous, then your implementation isn't standard compliant. — Kerrek SB, Sep 02 '11 at 15:48
Your `std::vector` is not conforming if it doesn't use contiguous memory. Generally, on most implementations for types where `memcpy` is valid, `std::copy` performs about the same as `memmove`. — CB Bailey, Sep 02 '11 at 15:49
Measure the difference, in your application. There are too many variables for us (or you) to say that one is necessarily faster than the other. — Robᵩ, Sep 02 '11 at 15:50
@Kerrek: I don't know about C++98, but C++03 says this in parapgraph 23.2.4: "The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()." — Sven, Sep 02 '11 at 15:51
@Kerrek : As I recall, the two major changes were that and creating the distinction between default-initialization and value-initialization. — ildjarn, Sep 02 '11 at 15:53
@Kerrek - C++98 just didn't says anything about vector being contiguous or not. Most people assumed it would be that anyway. — Bo Persson, Sep 02 '11 at 15:54
What evidence do you have that your vector isn't using contiguous memory? — TheJuice, Sep 02 '11 at 16:02
Coming in late to my own discussion - but this is a RTOS unix variant that's supposed to be "unix-like" as they call it. Memcpying the vector from address 0 yielded correct results for the first element and jibberish for the rest, so I know it's not contiguous, though I could look into the implementation. That wasn't really my question though - I just wanted to know if my solution was efficient enough :) — John Humphreys, Sep 02 '11 at 17:21

R. Martinho Fernandes · Accepted Answer · 2011-09-02T16:21:48.427

14

A reasonably decent implementation will have std::copy compile to a call memmove in the situations where this is possible (i.e. the element type is a POD).

If your implementation doesn't have contiguous storage (the C++03 standard requires it), memmove might be faster than std::copy, but probably not too much. I would start worrying only when you have measurements to show it is indeed an issue.

edited Sep 02 '11 at 16:21

answered Sep 02 '11 at 15:51

R. Martinho Fernandes

228,013
71
433
510

1

A reasonable implementation is more likely to use `memmove`. To be able to safely use `memcpy` requires more stringent conditions to be satisfied at compile time. – CB Bailey Sep 02 '11 at 15:54
2

@JerryCoffin: Not true. You can use `std::copy` to move a range backwards in memory and `std::copy_backward` to move a range forwards. (Slightly unintuitive, I grant.) – CB Bailey Sep 02 '11 at 15:59
§25.2.1/3: "Requires: result shall not be in the range [first, last)." – Jerry Coffin Sep 02 '11 at 16:00
1

`result`, not `result + k` for any `k` in range. If this restriction meant no overlap then `copy_backward` (similarly restricted) would have little value. – CB Bailey Sep 02 '11 at 16:02
@Charles: Good point. I stand corrected. – Jerry Coffin Sep 02 '11 at 16:07
Thanks all. Changed to `memmove`. – R. Martinho Fernandes Sep 02 '11 at 16:21

score 14 · Answer 2 · answered Sep 02 '11 at 16:07

14

While you've gotten a number of good answers, I feel obliged to add one more point: even if the code is theoretically less efficient, it's rarely likely to make any real difference.

The reason is pretty simple: the CPU is a lot faster than memory in any case. Even relatively crappy code will still easily saturate the bandwidth between the CPU and memory. Even if the data involved is in the cache, the same generally remains true -- and (again) even with crappy code, the move is going to be done far too quickly to care anyway.

Quite a few CPUs (e.g., Intel x86) have a special path in the hardware that will be used for most moves in any case, so there will often be literally no difference in speed between implementations that appear quite a bit different even at the assembly code level.

Ultimately, if you care about the speed of moving things around in memory, you should worry more about eliminating that than making it faster.

answered Sep 02 '11 at 16:07

Jerry Coffin

476,176
80
629
1,111

1

+1: *Exactly*. The only place where it will make a difference is if you aren't compiling optimized (in which case the call to `std::copy` is not optimized away, but also in which case why are you worried about performance?), if using `memcpy` would give the wrong results (in which case you should be using `memmove` anyhow), or if little bit of gyrations that `memmove` does to determine if it is safe to use `memcpy` overwhelm the memory copy itself (in which case you are copying a tiny bit of memory). – David Hammen Sep 02 '11 at 16:57
*even if the code is theoretically less efficient, it's rarely likely to make any real difference* At least in 2001 this was not the case at all, and on many embedded architectures it's still not the case. See, for example [Mike Wall's "Using Block Prefetch for Optimized Memory Performance"](http://web.mit.edu/ehliu/Public/ProjectX/Meetings/AMD_block_prefetch_paper.pdf). The differences can be dramatic (say 3x faster than naive code). – Kuba hasn't forgotten Monica Dec 13 '13 at 20:50
@KubaOber: Do you have some good reason to believe that the standard library on those machines uses particularly naive code (or at least that it's `memmove` uses substantially better code that its `std::copy`)? – Jerry Coffin Dec 13 '13 at 21:00
The way I understood your implication was that, essentially, any code would be "good enough". It probably will be good enough on modern "mainstream" Intel chips, where a for nontrivial-sized blocks a simple K&R C-implementation of `memcpy` saturates the memory, given a recent compiler. As soon as you're not on the latest and greatest, things get "interesting". – Kuba hasn't forgotten Monica Dec 13 '13 at 22:21

score 5 · Answer 3 · answered Sep 02 '11 at 15:48

5

std::copy will use memcpy when it is appropriate, so you should just use std::copy and let it do the work for you.

answered Sep 02 '11 at 15:48

Tony The Lion

61,704
67
242
415

4

`std::copy` is more likely to call `memmove` than `memcpy` because the ranges for `std::copy` are allowed to overlap and if they don't, this is not always statically verifiable. – CB Bailey Sep 02 '11 at 15:53
2

Checking: `printf '#include \nvoid DoCopy( char* o, const char* i, std::size_t count ) { std::copy( i, i + count, o ); }' | gcc -S -O3 -o - -std=c++98 -x c++ -` gives `jmp memmove` on my system. – CB Bailey Sep 02 '11 at 16:00
1

@Charles is right, the STL cannot decide to use `memcpy` at compile time as there is no guarantee that the ranges will not overlap. – David Rodríguez - dribeas Sep 02 '11 at 16:29
1

@Charles: same on VC2010. And I also checked, if you use two vectors and do `std::copy(src.begin(), src.end(), dest.begin())` it also ends up calling `memmove`. In particular it inlines to `call __imp__memmove`. – Sven Sep 02 '11 at 16:39

Efficiency of std::copy vs memcpy

3 Answers3

Linked