I'm profiling some numerical code written in C (profiler is Instruments, compiler is clang
on Mac OSX 10.11.6). As much as 77.3% of the running time is spent in _platform_memmove$VARIANT$Haswell
.
In the assembly output, the above function is called by DYLD-STUB$$memcpy
. However, I have no memcpy
's in my C code (I do have some malloc
's though).
Going deeper, it seems that the assembly command rep
is responsible for taking up so much time. From this post, it seems that rep
is not doing anything useful. Why does the compiler insert it? And where do the memcpy
's come from?
I also tried compiling with -g
, but then _platform_memmove$VARIANT$Haswell
is not gobbling up almost all of the time anymore.