Cuda-gdb was obeying all the breakpoints I would set, before adding '-arch sm_20' flag while compiling. I had to add this to avoid error being thrown : 'atomicAdd is undefined' (as pointed here). Here is my current statement to compile the code:
nvcc -g -G --maxrregcount=32 Main.cu -o SW_exe (..including header files...) -arch sm_20
and when I set a breakpoint inside kernel, cuda-gdb stops once at the last line of the kernel, and then the program continues.
(cuda-gdb) b SW_kernel_1.cu:49
Breakpoint 1 at 0x4114a0: file ./SW_kernel_1.cu, line 49.
...
[Launch of CUDA Kernel 5 (diagonalComputation<<<(1024,1,1),(128,1,1)>>>) on Device 0]
Breakpoint 1, diagonalComputation (__cuda_0=15386, __cuda_1=128, __cuda_2=0xf00400000, __cuda_3=0xf00200000,
__cuda_4=100, __cuda_5=0xf03fa0000, __cuda_6=0xf04004000, __cuda_7=0xf040a0000, __cuda_8=0xf00200200,
__cuda_9=15258, __cuda_10=5, __cuda_11=-3, __cuda_12=8, __cuda_13=1) at ./SW_kernel_1.cu:183
183 }
(cuda-gdb) c
Continuing.
But as I said, if I remove the 'atomicAdd()' call and the flag '-arch sm_20' which though makes my code incorrect, but now the cuda-gdb stops at the breakpoint I specify. Please tell me the reasons of this behaviour.
I am using CUDA 5.5 on Tesla M2070 (Compute Capability = 2.0).
Thanks!