How fast is memcpy
on x86_64 with gcc compiler and on Linux. At best is it equal to Time to transfer 1 long * Number of longs
or is it better than that?

- 20,589
- 43
- 136
- 219
-
1Which compiler? Which C library? – JeremyP Apr 23 '12 at 11:19
-
Are you asking "how do I compute the time to copy?", or are you asking "is memcpy any slower than the underlying hardware?". – Oliver Charlesworth Apr 23 '12 at 11:21
-
This might be relevant: [multithreaded memcpy @stackoverflow](http://stackoverflow.com/questions/4260602/how-to-increase-performance-of-memcpy) and this [intel software network](http://software.intel.com/en-us/articles/memcpy-performance/) – Dima Chubarov Apr 23 '12 at 11:26
-
On Linux with a recent GCC it might even be optimized by GCC and could use vectorizing machine instructions (SSE or AVX, etc...) – Basile Starynkevitch Apr 23 '12 at 11:29
2 Answers
This is completely dependent on the CRT implementation of the function - you should be able to see the source code for your compiler and be 100% sure.
Typically it's optimized to copy blocks that are efficient for the machine, and perform appropriate edge case handling depending on the alignment of the start/end addresses. Given the need to handle any length and alignment it's unlikely to be faster than pure long
copy (that statement is modulo your platform again remember) but it's also unlikely that the slowdown will make a noticeable difference to your real-world application.

- 53,498
- 9
- 91
- 140
AFAIK, the fastest possible copy for x86 (32 and 64-bit) uses 16-byte wide data transfers, which is the size of one XMM register. This is the method recommended in Intel's optimization manual. To be sure, however, you'd have to disassemble your system library and see which method it uses.

- 24,186
- 3
- 55
- 65
-
1On Linux, with a recent enough GCC and sufficient optimization (e.g. `-O2` at least), `memcpy` gets compiled as `builtin_memcpy` and may involve magical "tricks" inside the GCC compiler to get translated to some quite optimized and vectorized machine code. – Basile Starynkevitch Apr 23 '12 at 12:10