C++ Code profiling/analysis for Mac and MPI

Question

I am looking for a code analysis/profiling tool for C++ on MacOS. I know that there have been posts about this thread, but my application in which I need is very specific, so maybe one can give me a little more specific advice.

So here is my problem: I am writing a scientific code (master's project) in C++, so it's a pure console application, no interactivity given. The code is supposed to run on massively parallel computers, thus I use MPI. However, right now I am not yet optimizing for scalability, but only for singlecore performance. Since I do not want to rewrite the whole programm as a serial one, I just use MPI with 1 thread. It works fine, but the optimizer obviously needs to be able to deal with this.

What do I want to analyze? Well, the code is not very complex in a sense that it has a very simple structure and thus all I need would be a list of how long the program spends in certain functions, so that I know where it loses most time and I can measure the speedup of my optimizations.

Thanks for all ideas

score 3 · Answer 1 · answered Mar 16 '13 at 21:40

You should use Instruments.app which includes a CPU sampler and thread activity viewer... among other things. (Choose "Product > Profile..." in Xcode)

If you want something more fine-grained, you could instrument your code. Coincidentally, I wrote a set of profiling macros just for such an occasion :)

https://github.com/nielsbot/Profiler

This will show a nice nested print out of time spent in instrumented routines. (instructions on that page)

score 1 · Answer 2 · answered Mar 16 '13 at 21:04

1

Did you try kcachgrind: http://kcachegrind.sourceforge.net/html/Home.html with valgrind ?

answered Mar 16 '13 at 21:04

gregory561

14,866
2
23
25

score 1 · Answer 3 · answered Mar 16 '13 at 21:50

1

I can recommed http://www.scalasca.org/ . You can use it also for the parallel performance afterwards.

answered Mar 16 '13 at 21:50

Vladimir F Героям слава

57,977
4
76
119

score 1 · Answer 4 · edited Apr 13 '17 at 12:53

1

Don't look for "slow functions" and don't look to measure the time used by different pieces. Those concepts are so indirect as to be almost useless for telling you what to optimize.

Instead, take some stroboscopic X-rays, on wall-clock time, of what the entire program is doing, and study each one to see why the program is spending that instant of time. The reason this works better is it's not looking with function-colored glasses. It's looking with purpose-colored glasses, and you can tell if the program needs to be doing what it's doing. It's very accurate about locating big problems. It is not accurate about measuring them, nor does it need to be.

What happens when you just do measuring is this: You get a bunch of numbers on a bunch of routines. You look at them and say "what does it mean?". If it doesn't tell you what you should fix, you pat yourself on the back and say the program must be optimal. In fact, there probably is something you could fix, but you couldn't figure it out from the profiler. What's more, if you do find it and fix it, it can expose other things you can fix for even greater speedup.

That's what random pausing is about.

edited Apr 13 '17 at 12:53

Community

1
1

answered Mar 16 '13 at 21:50

Mike Dunlavey

40,059
14
91
135

2

Thanks for that idea, but after all you need some tool to actually benchmark your performance - what do you use for that? – Chris Mar 17 '13 at 14:18
@Chris: Anything I can use to time it. A change in performance of more than a few percent doesn't require an atomic clock. If you can find multiple things to fix, their individual speedup factors multiply together, and it's possible to get speedup of more than an order of magnitude. [*Here's an example of 43x.*](http://stackoverflow.com/a/927773/23771) Going from 43 seconds to 1 second does not require fancy measuring tools. How about going from [*12 minutes to 1 second?*](http://scicomp.stackexchange.com/a/1870/1262) – Mike Dunlavey Mar 17 '13 at 17:55
I see Mike's point, although I'd like to point out that the profiler in Instruments is a statistical sampler.. (logs your program call stack every _x_ms).. which is quite similar to what you are advocating I believe. – nielsbot Mar 17 '13 at 19:28
@nielsbot: Among profilers, IMHO the best are those (like [*Zoom*](http://www.rotateright.com/)) that 1) sample the stack, on 2) wall-clock time (not just CPU time), and 3) report % stack-presence at the level of lines of code (not just functions). That said, what's a typical speedup factor anyone gets with a profiler? [*Here's 43x with this method.*](http://stackoverflow.com/a/927773/23771) You can only do that if you don't miss problems. This method, though maybe crude, finds a superset of the problems that profilers find. That's why the huge difference in speedup results. – Mike Dunlavey Mar 17 '13 at 21:34
Yes, the sampler in Instruments does what you describe. In addition you can monitor thread activity to diagnose scheduling stalls/resource conflicts. In the case of making single routines faster, for example a drawing loop, I find the old school profiler like the one I linked to more useful. – nielsbot Mar 17 '13 at 23:02
@nielsbot: "does what you describe" Only if you take a narrow view of what I describe. The 43x program is about 800 lines. Take a 10^6 line program. 3 samples are taken during startup. 2 of them show I/O to read a resource string from a dll so as to display the string on the startup process splash. That's 20 stack levels, and looking at the data, to understand what it's doing. Is it necessary? Well, on average that phase will be 4 times faster if it's removed or the strings just built-in. There's no profiler could see that. Profiler fans don't seem to care about actual results. – Mike Dunlavey Mar 18 '13 at 01:01
In fact a statistical sampler would call out the string loading problem. Run the sampler as your program is starting. Examine the samples between launch and app fully running and you will definitely see "hey I'm spending a lot of time loading strings during the startup splash screen" – nielsbot Mar 18 '13 at 02:21
Isn't a trace like this http://f.cl.ly/items/341K2K1h253c3Y3F0720/Screen%20Shot%202013-03-17%20at%207.14.53%20PM.png giving you the same information as your random pausing technique? Maybe I'm not understanding. Random pausing = statistically sampling a process, right? This trace is also statistically sampling a process. – nielsbot Mar 18 '13 at 02:22
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26346/discussion-between-nielsbot-and-mike-dunlavey) – nielsbot Mar 18 '13 at 02:22

C++ Code profiling/analysis for Mac and MPI

4 Answers4