I have a rather large and complex CUDA code that hangs quite reliably for large numbers of blocks/threads. I am trying to figure out exactly where the code hangs.
When I run the code in cuda-gdb
, I can see which threads/blocks are hanging, but I can't see where, beyond the "virtual PC".
If I compile the code with "-G" to get the debug information, it runs a lot slower and refuses to hang, no matter how long I run it for.
Is there any way to map a "virtual PC" to a line of code in the source code, even approximately? Or is there a way to get the debugging information in without turning off all optimization?
I've tried using "-G3", yet to no avail. This just gives me warnings of the type "nvcc warning : Setting optimization level to 0 as optimized debugging is not supported
". I am using CUDA compilation tools release 4.1.