I have been profiling an application with nvprof and nvvp (5.5)
in order to optimize it. However, I get totally different results for some metrics/events like inst_replay_overhead
, ipc
or branch_efficiency
, etc. when I'm profiling the debug (-G
) and release version of the code.
so my question is: which version should I profile? The release or debug version? Or the choice depends upon what I'm looking for?
I found CUDA - Visual Profiler and Control Flow Divergence where is stated that a debug (-G
) version is needed to properly measure the divergent branches metric, but I am not sure about other metrics.