- "Start CUDA debugging" debugs device (kernel) code, i.e. stuff compiled with
nvcc
-> bunch of preprocessing
-> cudafe++
-> cicc
toolchain path.
- "Local Windows Debugger" debugs host code, a stuff compiled with either
nvcc
-> bunch of preprocessing
-> cl
or just cl
.
It does not matter in which file,.cpp
, .cu
or .h
your code is. The only thing that matters is if your code is annotated as __device__
or __global__
or not.
As of CUDA 7.5 RC (Aug 2015), on Windows you can only debug one of those at a time. On Linux and OSX you can debug both at the same time with cuda-gdb
.
See also: NVIDIA CUDA Compiler Driver NVCC
Other things that could lead to frustration during debugging on Windows:
- You are setting up properties for one configuration/platform pair, but running another one
- Something went wrong with
.pdb
files for host and device modules. Check nvcc
, cl
, nvlink
and link
options. For example host and device debug info could be written in the same file, overwriting each other.
- Aggressive optimizations: inlining, optimizing out locals, etc. Release code is almost impossible to debug for a human. Debugger can be fooled as well.
- Presence of undefined behavior and/or of memory access violations. They can easily crash debugger leading to unexpected results, such as breakpoints not being hit.
- You forgot to check errors for one of the CUDA API or kernel calls, there was error, and CUDA context is dead and kernels will not run anymore. But you don't know this yet. Your host code continues to run, and you expect kernel breakpoints to hit, but it will never happen, because kernel will just not be called.
- All bugs described above could be in a library. Don't expect libraries to be bug-free.
- Compilers, debuggers and drivers have bugs too. But you should always assume it's something wrong with your code first, and if nothing helps, investigate and file a bug report to a vendor.