Profile debug or release cuda code?

Question

I have been profiling an application with nvprof and nvvp (5.5) in order to optimize it. However, I get totally different results for some metrics/events like inst_replay_overhead, ipc or branch_efficiency, etc. when I'm profiling the debug (-G) and release version of the code.

so my question is: which version should I profile? The release or debug version? Or the choice depends upon what I'm looking for?

I found CUDA - Visual Profiler and Control Flow Divergence where is stated that a debug (-G) version is needed to properly measure the divergent branches metric, but I am not sure about other metrics.

I don't see anything in the link you provided that says that -G is needed to properly measure the divergent branches metric. That specific profiler feature being referred to (back-referencing to source) can be accomplished with either a release or debug version, as spelled out in the answer provided there. — Robert Crovella, Jan 13 '15 at 22:32
Robert Crovella, you are correct. The source in the link gives two options, and i do not mention that. Thank you. — ScHuMi, Jan 14 '15 at 20:23

score 5 · Accepted Answer · answered Jan 13 '15 at 22:26

Profiling usually implies that you care about performance.

If you care about performance, you should profile the release version of a CUDA code.

The debug version (-G) will generate different code, which usually runs slower. There's little point in doing performance analysis (including execution time measurement, benchmarking, profiling, etc.) on a debug version of a CUDA code, in my opinion, for this reason.

The -G switch turns off most optimizations that the device code compiler might ordinarily make, which has a large effect on code generation and also often a large effect on performance. The reason for the disabling of optimizations is to facilitate debug of code, which is the primary reason for the -G switch and for a debug version of your code.

In general you want to do a full release build. If you want to use source correlated experiments add -lineinfo. If you need to look at the logical control flow of the application -G can sometimes be more useful than -lineinfo. When using -G avoid looking at any other metrics. — Greg Smith, Jan 14 '15 at 14:12

Profile debug or release cuda code?

1 Answers1