2

I'm profiling some numerical code written in C (profiler is Instruments, compiler is clang on Mac OSX 10.11.6). As much as 77.3% of the running time is spent in _platform_memmove$VARIANT$Haswell.

In the assembly output, the above function is called by DYLD-STUB$$memcpy. However, I have no memcpy's in my C code (I do have some malloc's though).

Going deeper, it seems that the assembly command rep is responsible for taking up so much time. From this post, it seems that rep is not doing anything useful. Why does the compiler insert it? And where do the memcpy's come from?

I also tried compiling with -g, but then _platform_memmove$VARIANT$Haswell is not gobbling up almost all of the time anymore.

Community
  • 1
  • 1
Nibor
  • 1,236
  • 9
  • 23
  • Can you post the code? What kind of variance/standard deviation do your measurements have? How does the total runtime change with `-g`? – EOF Aug 23 '16 at 12:52
  • 2
    `rep` is doing the actual copying in memcpy. – 2501 Aug 23 '16 at 13:02
  • Up for actually telling what you have done so far, and for finding out what was the problem. – Koshinae Aug 23 '16 at 13:50

1 Answers1

9

After a bit of more searching, I found the problem: I was passing a struct to a function, which gets copied each time, hence the memcpy.

I changed the function to accept a pointer to the struct, which sped up my code by a factor 5.

Nibor
  • 1,236
  • 9
  • 23
  • 2
    I'm guessing most people won't see this question or answer, since it's closed, but it was still helpful to me for a seemingly unrelated issue. +1 for reminding me of the importance of carefully passing pointers/structs. When programming in C, it's hard to be reminded of this too often! :-) – jvriesem Dec 06 '20 at 21:24