CUDA Runtime difference between release mode and debug mode

Question

I am running Visual Studio 2013. I am running CUDA 7.0.28

I can toggle the runtime difference just by checking or unchecking the CUDA option :

Generate GPU debug Information.

I have the device kernel running with a <<<1,1>>> and the error occurs even then.

My questions are :

Why would it give me different results in the release and debug mode?
What kind of things should i be looking for to try and track down why this is occurring.
Is there a way to break point within the kernel function? It does not appear so. Besides making printf statements what other means can i use to trace down the problem?

Thank you.

Usually running in debug mode results in some *unspecified behaviors* don't terminate the program. For example, I have experienced memory buffers being initialized to zero in the debug mode by default without me explicitly asking to do so. Or some times out-of-range accesses to shared memory buffers causing no termination to the kernel. I suggest catching the error using methods explained in [this post](http://stackoverflow.com/q/14038589/2386951). You also should be able to set breakpoints with Nsight for Visual Studio installed. — Farzad, Jul 27 '15 at 16:53

score 3 · Accepted Answer · edited May 23 '17 at 12:21

Why would it give me different results in the release and debug mode?

Under the hood, machine code generation from CUDA C/C++ source code will look very different in debug mode. The list of differences is too long to cover here, but mostly they are summarized by all compiler optimizations are turned off in debug mode. This can give rise to race conditions, for example, that are evident in debug but not release or vice versa.

What kind of things should i be looking for to try and track down why this is occurring.

I would start with the simplest tools. Use cuda-memcheck first by itself to confirm that the kernel is running without generating basic errors. If cuda-memcheck reports that your kernel is failing, follow the method here to isolate the failure to a single line of source code. After fixing any errors reported in this fashion by cuda-memcheck, use the cuda-memcheck subtool options including racecheck, synccheck, and initcheck, to see if any of these catch problems.

Is there a way to break point within the kernel function?

Yes, there are debuggers available both on windows, and linux. On windows the debugger is integrated into Visual Studio. There is documentation available, walkthroughs, and even youtube videos demonstrating how to perform various operations, such as setting a breakpoint. I wouldn't go down this path before using cuda-memcheck however.

Unfortunately, the memcheck found no errors. It was unable to run the synccheck or racecheck, maybe because of the <<<1,1>>> declaration. — John Styles, Jul 27 '15 at 20:07

CUDA Runtime difference between release mode and debug mode

1 Answers1