220

Is it better to use memcpy as shown below or is it better to use std::copy() in terms to performance? Why?

char *bits = NULL;
...

bits = new (std::nothrow) char[((int *) copyMe->bits)[0]];
if (bits == NULL)
{
    cout << "ERROR Not enough memory.\n";
    exit(1);
}

memcpy (bits, copyMe->bits, ((int *) copyMe->bits)[0]);
Whymarrh
  • 13,139
  • 14
  • 57
  • 108
user576670
  • 2,227
  • 2
  • 14
  • 3
  • Note that `char` can be signed or unsigned, depending on the implementation. If the number of bytes can be >= 128, then use `unsigned char` for your byte arrays. (The `(int *)` cast would be safer as `(unsigned int *)`, too.) – Dan Breslau Jan 16 '11 at 18:03
  • 14
    Why aren't you using `std::vector`? Or since you say `bits`, `std::bitset`? – GManNickG Jan 16 '11 at 19:20
  • I believe that the use of `std::nothrow` is incorrect here, is that correct? I thought `nothrow` was for overloading of the operator `new` only? – FreelanceConsultant Aug 25 '15 at 21:34
  • 2
    Actually, could you please explain to me what `(int*) copyMe->bits[0]` does? – FreelanceConsultant Aug 25 '15 at 21:38
  • 4
    not sure why something that seems like such a mess with so little vital context provided was at +81, but hey. @user3728501 my guess is that the start of the buffer holds an `int` dictating its size, but that seems like a recipe for implementation-defined disaster, like so many other things here. – underscore_d Apr 12 '16 at 19:22
  • 3
    In fact, that `(int *)` cast is just pure undefined behaviour, not implementation-defined. Trying to do type-punning via a cast violates strict aliasing rules and hence is totally undefined by the Standard. (Also, in C++ although not C, you can't type-pun via a `union` either.) Pretty much the only exception is if you're converting **to** a variant of `char*`, but the allowance is not symmetrical. – underscore_d Mar 19 '17 at 19:07
  • [SSE-copy, AVX-copy and std::copy performance](https://stackoverflow.com/q/18314523/995714) – phuclv Dec 17 '21 at 09:39

8 Answers8

261

I'm going to go against the general wisdom here that std::copy will have a slight, almost imperceptible performance loss. I just did a test and found that to be untrue: I did notice a performance difference. However, the winner was std::copy.

I wrote a C++ SHA-2 implementation. In my test, I hash 5 strings using all four SHA-2 versions (224, 256, 384, 512), and I loop 300 times. I measure times using Boost.timer. That 300 loop counter is enough to completely stabilize my results. I ran the test 5 times each, alternating between the memcpy version and the std::copy version. My code takes advantage of grabbing data in as large of chunks as possible (many other implementations operate with char / char *, whereas I operate with T / T * (where T is the largest type in the user's implementation that has correct overflow behavior), so fast memory access on the largest types I can is central to the performance of my algorithm. These are my results:

Time (in seconds) to complete run of SHA-2 tests

std::copy   memcpy  % increase
6.11        6.29    2.86%
6.09        6.28    3.03%
6.10        6.29    3.02%
6.08        6.27    3.03%
6.08        6.27    3.03%

Total average increase in speed of std::copy over memcpy: 2.99%

My compiler is gcc 4.6.3 on Fedora 16 x86_64. My optimization flags are -Ofast -march=native -funsafe-loop-optimizations.

Code for my SHA-2 implementations.

I decided to run a test on my MD5 implementation as well. The results were much less stable, so I decided to do 10 runs. However, after my first few attempts, I got results that varied wildly from one run to the next, so I'm guessing there was some sort of OS activity going on. I decided to start over.

Same compiler settings and flags. There is only one version of MD5, and it's faster than SHA-2, so I did 3000 loops on a similar set of 5 test strings.

These are my final 10 results:

Time (in seconds) to complete run of MD5 tests

std::copy   memcpy      % difference
5.52        5.56        +0.72%
5.56        5.55        -0.18%
5.57        5.53        -0.72%
5.57        5.52        -0.91%
5.56        5.57        +0.18%
5.56        5.57        +0.18%
5.56        5.53        -0.54%
5.53        5.57        +0.72%
5.59        5.57        -0.36%
5.57        5.56        -0.18%

Total average decrease in speed of std::copy over memcpy: 0.11%

Code for my MD5 implementation

These results suggest that there is some optimization that std::copy used in my SHA-2 tests that std::copy could not use in my MD5 tests. In the SHA-2 tests, both arrays were created in the same function that called std::copy / memcpy. In my MD5 tests, one of the arrays was passed in to the function as a function parameter.

I did a little bit more testing to see what I could do to make std::copy faster again. The answer turned out to be simple: turn on link time optimization. These are my results with LTO turned on (option -flto in gcc):

Time (in seconds) to complete run of MD5 tests with -flto

std::copy   memcpy      % difference
5.54        5.57        +0.54%
5.50        5.53        +0.54%
5.54        5.58        +0.72%
5.50        5.57        +1.26%
5.54        5.58        +0.72%
5.54        5.57        +0.54%
5.54        5.56        +0.36%
5.54        5.58        +0.72%
5.51        5.58        +1.25%
5.54        5.57        +0.54%

Total average increase in speed of std::copy over memcpy: 0.72%

In summary, there does not appear to be a performance penalty for using std::copy. In fact, there appears to be a performance gain.

Explanation of results

So why might std::copy give a performance boost?

First, I would not expect it to be slower for any implementation, as long as the optimization of inlining is turned on. All compilers inline aggressively; it is possibly the most important optimization because it enables so many other optimizations. std::copy can (and I suspect all real world implementations do) detect that the arguments are trivially copyable and that memory is laid out sequentially. This means that in the worst case, when memcpy is legal, std::copy should perform no worse. The trivial implementation of std::copy that defers to memcpy should meet your compiler's criteria of "always inline this when optimizing for speed or size".

However, std::copy also keeps more of its information. When you call std::copy, the function keeps the types intact. memcpy operates on void *, which discards almost all useful information. For instance, if I pass in an array of std::uint64_t, the compiler or library implementer may be able to take advantage of 64-bit alignment with std::copy, but it may be more difficult to do so with memcpy. Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range, then the aligned portion, then the unaligned portion at the end. If it is all guaranteed to be aligned, then the code becomes simpler and faster, and easier for the branch predictor in your processor to get correct.

Premature optimization?

std::copy is in an interesting position. I expect it to never be slower than memcpy and sometimes faster with any modern optimizing compiler. Moreover, anything that you can memcpy, you can std::copy. memcpy does not allow any overlap in the buffers, whereas std::copy supports overlap in one direction (with std::copy_backward for the other direction of overlap). memcpy only works on pointers, std::copy works on any iterators (std::map, std::vector, std::deque, or my own custom type). In other words, you should just use std::copy when you need to copy chunks of data around.

Dev Null
  • 4,731
  • 1
  • 30
  • 46
David Stone
  • 26,872
  • 14
  • 68
  • 84
  • 47
    I want to emphasize that this doesn't mean that `std::copy` is 2.99% or 0.72% or -0.11% faster than `memcpy`, these times are for the entire program to execute. However, I generally feel that benchmarks in real code are more useful than benchmarks in fake code. My entire program got that change in execution speed. The real effects of just the two copying schemes will have greater differences than shown here when taken in isolation, but this shows that they can have measurable differences in actual code. – David Stone Apr 03 '12 at 17:31
  • 3
    I want to disagree with your findings, but results are results :/. However one question (I know it was a long time ago and you don't remember research, so just comment the way you think), you probably didn't look into assembly code; – ST3 Jan 06 '15 at 09:17
  • 2
    In my opinion `memcpy` and `std::copy` has different implementations, so in some cases compiler optimizes surrounding code and actual memory copy code as a one integral piece of code. It other words _sometimes_ one is better then another and even in other words, deciding which to uses is premature or even stupid optimization, because in every situation you have to do new research and, what is more, programs are usually being developed, so after some minor changes advantage of function over other may be lost. – ST3 Jan 06 '15 at 09:18
  • 4
    @ST3: I would imagine that in the worst case, `std::copy` is a trivial inline function that just calls `memcpy` when it is legal. Basic inlining would eliminate any negative performance difference. I will update the post with a bit of an explanation of why std::copy might be faster. – David Stone Jan 08 '15 at 02:21
  • 2
    This runs contrary to what I have seen with GCC under templates. The code was slowed down by 5% using std::copy over memset, however this may be to do with the additional code generated by std::copy and how that is optimised in template use, rather than by the speed of the code itself. – metamorphosis Oct 20 '15 at 00:22
  • 11
    Very informative analysis. Re _Total average decrease in speed of std::copy over memcpy: 0.11%_, whilst the number is correct, the results aren't statistically significant. A 95% confidence interval for the difference in means is (-0.013s, 0.025), which includes zero. As you pointed out there was variation from other sources and with your data, you'd probably say the performance is the same. For reference, the other two results are statistically significant -- the chances you'd see a difference in times this extreme by chance are about 1 in 100 million (first) and 1 in 20,000 (last). – TooTone Mar 11 '16 at 13:24
  • Is there any penalty by using `std:copy(std::begin(a), std::end(a), std::begin(b))` ? – xvan Mar 15 '16 at 06:53
  • Sorry about that but I didn't understand how std::copy can beat memcpy? Doesn't memcpy just copie words of data (according to HW support) without knowing what this represents? How can this be beaten? – Hanna Khalil Dec 03 '16 at 00:44
  • 2
    Your architecture may have an instruction to copy 64, 128, or 256 bit objects at a time. With `memcpy`, to take advantage of this, there has to be a prologue and an epliogue that copy any extra trailing data byte by byte. With `std::copy`, because the types are maintained, there are certain alignment guarantees that are not lost. In theory, the optimizer could see this as part of its inlining as well, at which point there would be no difference. – David Stone Dec 03 '16 at 20:37
  • 1
    Your answer largely depends on external code which could change and you didn't explain where to look in your code, so everyone interesting need to look through all your code. – Yola Jan 14 '18 at 08:00
  • TL;DR the main difference is that std::copy is fully defined in stdlib and can be inlined with all further optimizations applied, std::memcpy relies on libc and usually binaries link with it dynamically (for instance, due to architecture glibc heavily uses dlopen, so it loads itself dynamically anyway). Anytime you call memcpy you will get (pun intended) `call memcpy` in asm code. So yeah in heavy-loaded loops, operating on small enough chunks of data std::copy can be a winner with high percent of sure. – kravitz Oct 15 '20 at 03:45
  • The repo links are dead now so we can't see how you called `std::copy`, like for what size or types, or whether it was likely to optimize away. (Note that compilers like GCC define `memcpy` as `__builtin_memcpy`, and will inline it using XMM or YMM instructions for small fixed-size copies.) – Peter Cordes May 02 '22 at 10:22
  • And BTW, glibc memcpy does *not* have to copy a byte at a time on ISAs with efficient unaligned load/store. Its actual strategy on x86 for copies from 15 to 31 bytes, for example, is to do two `movups xmm` loads that line up with the start and end of the region, then two stores. So it works as memmove without any special overlap checking, too. For size=16 they're fully overlapping, instead of doing it as two 8-byte halves which could lead to store-forwarding stalls later depending how it's reloaded. And for size=31 they overlap by 1 byte. (Size=32 uses YMM). – Peter Cordes May 02 '22 at 10:26
  • See https://code.woboq.org/userspace/glibc/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S.html#19 for source in AT&T assembly syntax, with comments describing the strategy. Of course that's only relevant when GCC does *not* inline the `memcpy` itself. e.g. runtime-variable copy size, or large. (Large copies benefit from letting runtime dispatching via dynamic linking maybe use wider vectors that weren't enabled at compile time, and allow unrolled loops too large to inline, and picking a strategy...) – Peter Cordes May 02 '22 at 10:26
  • 1
    `std::copy` might be fast, but is not guaranteed to be. No compiler specification explicitly documents that `std::copy` is always optimized when possible, even on the highest optimization level. – Tuff Contender Oct 02 '22 at 16:47
86

All compilers I know will replace a simple std::copy with a memcpy when it is appropriate, or even better, vectorize the copy so that it would be even faster than a memcpy.

In any case: profile and find out yourself. Different compilers will do different things, and it's quite possible it won't do exactly what you ask.

See this presentation on compiler optimisations (pdf).

Here's what GCC does for a simple std::copy of a POD type.

#include <algorithm>

struct foo
{
  int x, y;    
};

void bar(foo* a, foo* b, size_t n)
{
  std::copy(a, a + n, b);
}

Here's the disassembly (with only -O optimisation), showing the call to memmove:

bar(foo*, foo*, unsigned long):
    salq    $3, %rdx
    sarq    $3, %rdx
    testq   %rdx, %rdx
    je  .L5
    subq    $8, %rsp
    movq    %rsi, %rax
    salq    $3, %rdx
    movq    %rdi, %rsi
    movq    %rax, %rdi
    call    memmove
    addq    $8, %rsp
.L5:
    rep
    ret

If you change the function signature to

void bar(foo* __restrict a, foo* __restrict b, size_t n)

then the memmove becomes a memcpy for a slight performance improvement. Note that memcpy itself will be heavily vectorised.

Peter Alexander
  • 53,344
  • 14
  • 119
  • 168
  • 2
    How can I do profiling. What tool to use (in windows and linux)? – user576670 Jan 16 '11 at 18:00
  • You don't even need to use a tool for this. Just get the time before the copy, the time after, and subtract to get the time it took :-) – Peter Alexander Jan 16 '11 at 18:01
  • IIRC `copy` actually dispatches to `memmove`, not `memcpy` (in the cases where it does that at all) but I think it’s safe to surmise that it does that because it’s even faster. – Konrad Rudolph Jan 16 '11 at 18:02
  • @user576670: A profiler! (I couldn't resist.) A profiler depends on implementation details of the compiler (in various ways), and is generally written to work with a specific compiler. Gprof works with gcc. – Fred Nurk Jan 16 '11 at 18:03
  • 7
    @Konrad, you're correct. But `memmove` shouldn't be faster - rather, it should be slighter slower because it has to take into account the possibility that the two data ranges overlap. I think `std::copy` permits overlapping data, and so it has to call `memmove`. – Charles Salvia Jan 16 '11 at 18:04
  • 2
    @Konrad: If memmove was always faster than memcpy, then memcpy would call memmove. What std::copy actually might dispatch to (if anything) is implementation-defined, so it's not useful to mention specifics without mentioning implementation. – Fred Nurk Jan 16 '11 at 18:04
  • @Fred: yes, of course. FYI, the implementation I had in mind is GCC (current versions but I don’t know if there are specific versions for which this doesn’t apply). – Konrad Rudolph Jan 16 '11 at 18:07
  • 1
    Although, a simple program to reproduce this behavior, compiled with -O3 under GCC shows me a `memcpy`. It leads me to believe GCC checks whether there's memory overlap. – jweyrich Jan 16 '11 at 18:31
  • And FWIW, a quick test with overlapping reveals it optimises to `memmove`. I'll never use `memcpy` and `memmove` again :) – jweyrich Jan 16 '11 at 18:55
  • 1
    @Konrad: standard `std::copy` allows overlap in one direction but not the other. The beginning of the output can't lie within the input range, but the beginning of the input is allowed to lie within the output range. This is a little odd, because the order of assignments is defined, and a call might be UB even though the effect of those assignments, in that order, is defined. But I suppose the restriction allows vectorization optimizations. – Steve Jessop Jan 16 '11 at 20:58
  • @PeterAlexander: Great presentation. Thanks for sharing! – Atmocreations Oct 14 '11 at 16:14
  • @StackedCrooked: See my edit. It seems that GCC just calls `memmove`/`memcpy`. Obviously, those will be vectorised for the general case. – Peter Alexander Oct 03 '12 at 21:17
  • If you're looking to make sure your own containers take advantage of this make sure you don't define a iterator class. In libc++ `vector::iterator` is defined as `T*` and that enables the `memmove` optimization (when the iterators are pointers to a trivially copy assignable type) – Christopher Tarquini Feb 22 '14 at 00:17
  • gcc and clang don't seem to like to inline a `rep movsq` if the copy size isn't known at compile time. And even then, they don't like to inline it for `std::copy`, only for `memcpy`. Even then, gcc5.3 will only `memcpy`/`memmove` [with `-mtune=generic`](http://goo.gl/4Jq56X), not `-mtune=haswell`. And it makes insane code with `-mtune=intel` (conditional branches on different bits of addresses, I guess checking alignment? Try it on that godbolt link by editing the command line.) `std::copy` checks for non-zero before tail-calling `memmove`, but the compiler see through it better, IDK. – Peter Cordes Mar 13 '16 at 14:38
29

Always use std::copy because memcpy is limited to only C-style POD structures, and the compiler will likely replace calls to std::copy with memcpy if the targets are in fact POD.

Plus, std::copy can be used with many iterator types, not just pointers. std::copy is more flexible for no performance loss and is the clear winner.

masoud
  • 55,379
  • 16
  • 141
  • 208
Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Why should you wanna copy around iterators? – Atmocreations Oct 14 '11 at 15:41
  • 3
    You're not copying the iterators, but rather the range defined by two iterators. For instance, `std::copy(container.begin(), container.end(), destination);` will copy the contents of `container` (everything between `begin` and `end`) into the buffer indicated by `destination`. `std::copy` doesn't require shenanigans like `&*container.begin()` or `&container.back() + 1`. – David Stone Apr 26 '12 at 17:13
17

In theory, memcpy might have a slight, imperceptible, infinitesimal, performance advantage, only because it doesn't have the same requirements as std::copy. From the man page of memcpy:

To avoid overflows, the size of the arrays pointed by both the destination and source parameters, shall be at least num bytes, and should not overlap (for overlapping memory blocks, memmove is a safer approach).

In other words, memcpy can ignore the possibility of overlapping data. (Passing overlapping arrays to memcpy is undefined behavior.) So memcpy doesn't need to explicitly check for this condition, whereas std::copy can be used as long as the OutputIterator parameter is not in the source range. Note this is not the same as saying that the source range and destination range can't overlap.

So since std::copy has somewhat different requirements, in theory it should be slightly (with an extreme emphasis on slightly) slower, since it probably will check for overlapping C-arrays, or else delegate the copying of C-arrays to memmove, which needs to perform the check. But in practice, you (and most profilers) probably won't even detect any difference.

Of course, if you're not working with PODs, you can't use memcpy anyway.

Charles Salvia
  • 52,325
  • 13
  • 128
  • 140
  • 7
    This is true for `std::copy`. But `std::copy` can assume that its inputs are int-aligned. That will make a far bigger difference, because it affects every element. Overlap is a one-time check. – MSalters Jan 17 '11 at 08:39
  • 2
    @MSalters, true, but most implementations of `memcpy` I've seen check for alignment and attempt to copy words rather than byte by byte. – Charles Salvia Apr 28 '12 at 08:13
  • 1
    std::copy() can ignore overlapping memory, too. If you want to support overlapping memory, you have to write the logic yourself to call std::reverse_copy() in the appropriate situations. – Cygon Jun 06 '12 at 11:23
  • 2
    There is an opposite argument that can be made: when going through `memcpy` interface it loses the alignment information. Hence, `memcpy` has to do alignment checks at run-time to handle unaligned beginnings and ends. Those checks may be cheap but they are not free. Whereas `std::copy` can avoid these checks and vectorize. Also, the compiler may prove that source and destination arrays do not overlap and again vectorize without the user having to choose between `memcpy` and `memmove`. – Maxim Egorushkin Jan 12 '16 at 15:42
12

My rule is simple. If you are using C++ prefer C++ libraries and not C :)

UmmaGumma
  • 5,633
  • 1
  • 31
  • 45
  • 46
    C++ was explicitly designed to allow using C libraries. This was not an accident. It is often better to use std::copy than memcpy in C++, but this has nothing to do with which one is C, and that kind of argument is usually the wrong approach. – Fred Nurk Jan 16 '11 at 18:06
  • 2
    @FredNurk Usually you want to avoid weak area of C where C++ provide a safer alternative. – Phil1970 Apr 18 '17 at 23:13
  • @Phil1970 I'm not sure that C++ is much safer in this case. We still have to pass valid iterators that don't overrun, etc. I *guess* being able to use `std::end(c_arr)` instead of `c_arr + i_hope_this_is_the_right_number_of elements` is safer? and perhaps more importantly, clearer. And that'd be the point I emphasise in this specific case: `std::copy()` is more idiomatic, more maintainable if the types of the iterators changes later, leads to clearer syntax, etc. – underscore_d Jan 02 '18 at 18:08
  • 1
    @underscore_d `std::copy` is safer because it correctly copies the passed data in case they are not POD-types. `memcpy` will happily copy a `std::string` object to a new representation byte by byte. – Jens Jan 17 '19 at 09:05
4

Just a minor addition: The speed difference between memcpy() and std::copy() can vary quite a bit depending on if optimizations are enabled or disabled. With g++ 6.2.0 and without optimizations memcpy() clearly wins:

Benchmark             Time           CPU Iterations
---------------------------------------------------
bm_memcpy            17 ns         17 ns   40867738
bm_stdcopy           62 ns         62 ns   11176219
bm_stdcopy_n         72 ns         72 ns    9481749

When optimizations are enabled (-O3), everything looks pretty much the same again:

Benchmark             Time           CPU Iterations
---------------------------------------------------
bm_memcpy             3 ns          3 ns  274527617
bm_stdcopy            3 ns          3 ns  272663990
bm_stdcopy_n          3 ns          3 ns  274732792

The bigger the array the less noticeable the effect gets, but even at N=1000 memcpy() is about twice as fast when optimizations aren't enabled.

Source code (requires Google Benchmark):

#include <string.h>
#include <algorithm>
#include <vector>
#include <benchmark/benchmark.h>

constexpr int N = 10;

void bm_memcpy(benchmark::State& state)
{
  std::vector<int> a(N);
  std::vector<int> r(N);

  while (state.KeepRunning())
  {
    memcpy(r.data(), a.data(), N * sizeof(int));
  }
}

void bm_stdcopy(benchmark::State& state)
{
  std::vector<int> a(N);
  std::vector<int> r(N);

  while (state.KeepRunning())
  {
    std::copy(a.begin(), a.end(), r.begin());
  }
}

void bm_stdcopy_n(benchmark::State& state)
{
  std::vector<int> a(N);
  std::vector<int> r(N);

  while (state.KeepRunning())
  {
    std::copy_n(a.begin(), N, r.begin());
  }
}

BENCHMARK(bm_memcpy);
BENCHMARK(bm_stdcopy);
BENCHMARK(bm_stdcopy_n);

BENCHMARK_MAIN()

/* EOF */
Grumbel
  • 6,585
  • 6
  • 39
  • 50
  • 20
    Measuring performance with optimizations disabled is... well... pretty much pointless... If you are interested in performance you won't compile without optimizations. – bolov Oct 18 '16 at 13:32
  • 6
    @bolov Not always. A relatively fast program under debug is in some cases important to have. – Acorn Jul 11 '19 at 15:25
  • @bolov I used to think the same, but actually games running in debug mode can be heavily impacted by this. Well, maybe there are other solutions like inlining in debug mode... but that is a use case already. – Germán Diago Mar 31 '21 at 12:08
2

If you really need maximum copying performance (which you might not), use neither of them.

There's a lot that can be done to optimize memory copying - even more if you're willing to use multiple threads/cores for it. See, for example:

What's missing/sub-optimal in this memcpy implementation?

both the question and some of the answers have suggested implementations or links to implementations.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 6
    pedant mode: with the usual caveat that "**use neither of them**" means _if you have proven that you have a highly specific situation/requirement for which neither Standard function provided by your implementation is fast enough_; otherwise, my usual concern is that people who haven't proven that get sidetracked on prematurely optimising copying code instead of the usually more useful parts of their program. – underscore_d Jan 02 '18 at 18:22
-3

Profiling shows that statement: std::copy() is always as fast as memcpy() or faster is false.

My system:

HP-Compaq-dx7500-Microtower 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux.

gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

The code (language: c++):

    const uint32_t arr_size = (1080 * 720 * 3); //HD image in rgb24
    const uint32_t iterations = 100000;
    uint8_t arr1[arr_size];
    uint8_t arr2[arr_size];
    std::vector<uint8_t> v;

    main(){
        {
            DPROFILE;
            memcpy(arr1, arr2, sizeof(arr1));
            printf("memcpy()\n");
        }

        v.reserve(sizeof(arr1));
        {
            DPROFILE;
            std::copy(arr1, arr1 + sizeof(arr1), v.begin());
            printf("std::copy()\n");
        }

        {
            time_t t = time(NULL);
            for(uint32_t i = 0; i < iterations; ++i)
                memcpy(arr1, arr2, sizeof(arr1));
            printf("memcpy()    elapsed %d s\n", time(NULL) - t);
        }

        {
            time_t t = time(NULL);
            for(uint32_t i = 0; i < iterations; ++i)
                std::copy(arr1, arr1 + sizeof(arr1), v.begin());
            printf("std::copy() elapsed %d s\n", time(NULL) - t);
        }
    }

g++ -O0 -o test_stdcopy test_stdcopy.cpp

memcpy() profile: main:21: now:1422969084:04859 elapsed:2650 us
std::copy() profile: main:27: now:1422969084:04862 elapsed:2745 us
memcpy() elapsed 44 s std::copy() elapsed 45 s

g++ -O3 -o test_stdcopy test_stdcopy.cpp

memcpy() profile: main:21: now:1422969601:04939 elapsed:2385 us
std::copy() profile: main:28: now:1422969601:04941 elapsed:2690 us
memcpy() elapsed 27 s std::copy() elapsed 43 s

Red Alert pointed out that the code uses memcpy from array to array and std::copy from array to vector. That coud be a reason for faster memcpy.

Since there is

v.reserve(sizeof(arr1));

there shall be no difference in copy to vector or array.

The code is fixed to use array for both cases. memcpy still faster:

{
    time_t t = time(NULL);
    for(uint32_t i = 0; i < iterations; ++i)
        memcpy(arr1, arr2, sizeof(arr1));
    printf("memcpy()    elapsed %ld s\n", time(NULL) - t);
}

{
    time_t t = time(NULL);
    for(uint32_t i = 0; i < iterations; ++i)
        std::copy(arr1, arr1 + sizeof(arr1), arr2);
    printf("std::copy() elapsed %ld s\n", time(NULL) - t);
}

memcpy()    elapsed 44 s
std::copy() elapsed 48 s 
imatveev13
  • 28
  • 1
  • 1
    wrong, your profiling shows that copying into an array is faster than copying into a vector. Off topic. – Red Alert Feb 13 '15 at 01:58
  • I could be wrong, but in your corrected example, with memcpy, aren't you copying arr2 into arr1, while with std::copy, you are copying arr1 into arr2?... What you could do is to make multiple, alternating experiments (once a batch of memcpy, once a batch of std::copy, then back again with memcopy, etc., multiple times.). Then, I would use clock() instead of time(), because who knows what your PC could be doing in addition to that program. Just my two cents, though... :-) – paercebal Apr 17 '15 at 08:55
  • 9
    So, switching `std::copy` from a vector to an array somehow made `memcpy` take nearly twice as long? This data is highly suspect. I compiled your code using gcc with -O3, and the generated assembly is the same for both loops. So any difference in time you observe on your machine is only incidental. – Red Alert May 06 '15 at 00:46