TL;DR GCC doesn't optimize the call to memmove
inside std::copy
. When using two C-style arrays, it does. Replacing &v2[0]
with *v2.data()
allows it to optimize into a memcpy
.
Your example is pretty noisy so let's strip it down:
#include <vector>
#include <algorithm>
int a[5];
int b[5];
std::vector<int> v2;
I deliberately put the variables at file scope to prevent optimizing them away without having to deal with volatile
semantics.
First let's try:
std::copy(&a[0], &a[5], &b[0]);
With -O3 -fdump-tree-optimized
this becomes:
__builtin_memcpy (&b[0], &a[0], 20);
Stepping through GDB shows us:
Breakpoint 1, main () at test.cpp:9
9 std::copy(&a[0], &a[0] + 5, &b[0]);
(gdb) s
std::copy<int*, int*> (__result=0x601080 <b>, __last=0x6010b4, __first=0x6010a0 <a>) at test.cpp:9
9 std::copy(&a[0], &a[0] + 5, &b[0]);
(gdb) s
std::__copy_move_a2<false, int*, int*> (__result=0x601080 <b>, __last=0x6010b4, __first=0x6010a0 <a>) at test.cpp:9
9 std::copy(&a[0], &a[0] + 5, &b[0]);
(gdb) s
std::__copy_move_a<false, int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=<optimized out>) at test.cpp:9
9 std::copy(&a[0], &a[0] + 5, &b[0]);
(gdb) s
std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int> (__result=<optimized out>, __last=<optimized out>,
__first=<optimized out>) at /usr/include/c++/5.3.1/bits/stl_algobase.h:382
382 __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
(gdb) s
main () at test.cpp:10
10 }
Wait it used memmove
?! OK let's keep going.
What about:
std::copy(&a[0], &a[5], v2.begin());
OK that gets us memmove
:
int * _2;
<bb 2>:
_2 = MEM[(int * const &)&v2];
__builtin_memmove (_2, &a[0], 20);
Which is reflected in the assembly if we do -S
. Stepping through GDB shows us the process:
(gdb)
Breakpoint 1, main () at test.cpp:9
9 {
(gdb) s
10 std::copy(&a[0], &a[5], &v2[0]);
(gdb) s
std::copy<int*, int*> (__result=<optimized out>, __last=0x6010d4, __first=0x6010c0 <a>) at test.cpp:10
10 std::copy(&a[0], &a[5], &v2[0]);
(gdb) s
std::__copy_move_a2<false, int*, int*> (__result=<optimized out>, __last=0x6010d4, __first=0x6010c0 <a>) at test.cpp:10
10 std::copy(&a[0], &a[5], &v2[0]);
(gdb) s
std::__copy_move_a<false, int*, int*> (__result=<optimized out>, __last=<optimized out>, __first=<optimized out>) at test.cpp:10
10 std::copy(&a[0], &a[5], &v2[0]);
(gdb) s
std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<int> (__result=<optimized out>, __last=<optimized out>,
__first=<optimized out>) at /usr/include/c++/5.3.1/bits/stl_algobase.h:382
382 __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
(gdb) s
__memmove_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:55
Ah I see. It's using an optimized memcpy
routine provided by the C library. But wait a minute, that doesn't makes sense. memmove
and memcpy
are two different things!
Looking at the source code for this routine we see little checks sprinkled through out:
85 #ifndef USE_AS_MEMMOVE
86 cmp %dil, %sil
87 jle L(copy_backward)
88 #endif
GDB confirms that it is treating it as an memmove
:
55 mov %rdi, %rax
(gdb) s
61 cmp %rsi, %rdi
(gdb) s
62 jb L(copy_forward)
(gdb) s
63 je L(write_0bytes)
But if we replace &v2[0]
with *v2.data()
it doesn't call the GLIBC's memmove
. So what's going on?
Well v2[0]
and v2.begin()
return iterators while v2.data()
returns a direct pointer to the memory. I think this for some reason prevents GCC from optimizing the memmove
into a memcpy
.[citation needed]