0

I have Eclipse Nsight 5.0 (CUDA 5.0) installed on a 64 Bit Ubuntu 12.04 Machine with two Graphic Cards: Geforce GT 240 Desktop UI and Geforce GTX 480 for debugging. I can compile and run the Cuda program just fine. However, when I set a breakpoint in the Cuda code and start debugging, the cursor doesn't halt on the line but jumps to the end of the kernel function.

I have set in Debug Configurations the Geforce GTX 480 as the debugger and check the return value of each kernel call. What else can I try? Also, I don't have root permission on this PC.

soroush
  • 45
  • 8
user2773181
  • 11
  • 1
  • 2
  • Breakpoints are not necessarily hit in kernel functions. The code actually executed on the GPU is significantly different from the code you are trying to set breakpoints in since the CUDA compiler usually performs a very aggressive code optimization. – Vitality Oct 01 '13 at 20:11

1 Answers1

1

Does your kernel get executed when running under the debugger (e.g. do you see proper values updated)? It may be that your NVIDIA driver is not compatible with the toolkit.

If the kernel is not executing, chances are this is something simple, e.g. your kernel is compiled for architecture that is incompatible with the card you use to debug.

Do you have cudaDeviceSynchronize after your kernel call? Do you check its return value?

Eugene
  • 9,242
  • 2
  • 30
  • 29
  • Yes, the kernel gets executed under the debugger, as in a normal run of the program. I call cudaDeviceSynchronize after every kernel call and also check if the return vale is equal to cudaSuccess. I'll have to check which driver is installed for the Geforce GTX 480. Where can I check what driver is compatible with CUDA 5.0? – user2773181 Oct 01 '13 at 20:45
  • CUDA toolkit requires driver 304.54 or newer, though in some very rare cases newest driver may not work with older toolkits. Have you tried debugging with cuda-gdb from shell? – Eugene Oct 01 '13 at 20:53
  • Checking the return value of `cudaDeviceSynchronize` after a kernel call is not sufficient to catch all types of launch failures. In particular, it will not catch the type of launch failure resulting from the kernel being compiled for an incompatible architecture. Review [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). – Robert Crovella Oct 02 '13 at 04:11
  • My Nvidia Driver Version is 304.88 I debugged the program with cuda-gdb from terminal and I got the same problem. For instance I debugged the sample program bitreverse and pur a breakpointer in l.38, where a global function is called. Then cuda-gdb tells me that the breakpointer was at l.40, the end of the global function call. – user2773181 Oct 02 '13 at 15:06
  • Please try compiling and running some CUDA sample using the makefile from the sample directory (e.g. do not import it into Nsight). Also you may try enumerating your devices from the code to see what graphical adapters are hidden by cuda-gdb. – Eugene Oct 02 '13 at 15:36
  • Well, I cant compile a sample program with the makefile, because I dont have admin rights. But this is not the problem: I could compile bitreverse.cu in terminal with nvcc just fine and then run the program without problems. The problem is that the debugger wont break in cuda code but just after the kernel call and no information about the threads is shown. This behaviour is the same when I debug in eclipse nsight or cuda-gdb over command line. Im not sure what you mean with your last sentence. – user2773181 Oct 02 '13 at 16:22
  • You can compile vectorAdd (it does not depend on files outside the samples folder) by copying vectorAdd.cu to writable location and running "nvcc -g -G -O0 vectorAdd.cu" - this will generate debug information, will not optimize your code and will emit PTX 1.0 that is compatible with all CUDA devices. – Eugene Oct 02 '13 at 16:36
  • Ok, I could compile and then run the program. Then I tried to debug the program but got the same problem as before. I set a breakpointer in l.34 and when I arrive at that point, he tells me that the breakpointer is at l.40, the end of the global function. – user2773181 Oct 02 '13 at 17:05
  • Can you now try compiling with "nvcc –g –G –arch=sm_20 vectorAdd.cu –o vectorAdd"? This would avoid jitting the code. – Eugene Oct 02 '13 at 18:27
  • It's the same... Here is how I compile and debug the code in shell: [link](http://www.sourcepod.com/qxjoie02-20206) – user2773181 Oct 03 '13 at 13:32
  • What happens if you do `continue`? It should print out the error message. CUDA context is not created and kernel is not launched. – Eugene Oct 03 '13 at 22:45
  • Here is what happens if I do continue: [link](http://www.sourcepod.com/fzcena26-20212). The kernel gets executed but I cant debug it... – user2773181 Oct 04 '13 at 11:32
  • @user2773181 I can't reach that site. Meanwhile, cuda-gdb developers told me this forum thread may be related to the issue you are seeing - https://devtalk.nvidia.com/default/topic/541252/debugging-device-code-does-not-work/#3848816 – Eugene Oct 04 '13 at 18:19
  • I dont have the same problem, as I cant debug ANY kernels. Thank you for your help Eugene. I may ask for help in the devtalk forums. The website I linked is online now. – user2773181 Oct 07 '13 at 12:42