6

I'm running to cuda-memcheck to debug my code and the output is as follows

========= Program hit cudaErrorCudartUnloading (error 29) due to "driver shutting down" on CUDA API call to cudaFree. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:./nmt [0x53526]
=========     Host Frame:./nmt [0xfbd9]
terminate called after throwing an instance of '=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c259]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c2a5]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfc) [0x21ecc]
thrust::system::system_error'
=========     Host Frame:./nmt [0x530a]
=========
  what():  driver shutting down
========= Error: process didn't terminate successfully
========= Internal error (20)
========= No CUDA-MEMCHECK results found

Is it possible to tell from the line Host Frame:./nmt [0x53526] where is broken in the code? If so, how can I do that?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Hieu Pham
  • 97
  • 1
  • 8

2 Answers2

3

As @talonmies indicated (I suspect he will not mind if I post a CW answer), the cuda-memcheck tool provides additional stack back tracing capability, which can be enabled with the --show-backtrace switch added to the command line.

The back trace may consist of both host and device functions (i.e. host and device back traces.)

If the application has been also compiled with host debug symbol information (e.g. -g on linux) then cuda-memcheck can show function names for the host functions in the host backtrace.

Additional usage information is available in the documentation.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
1

For me cuda-memcheck with different sub-tools such as memcheck, racecheck, initcheck and synccheck often produces host backtraces without the line numbers or even without the host functions mentioned. Searching on the internet only shows this question, but I already pass -g or even -g3 to the host compiler, and --show-backtrace flag to cuda-memcheck is said in the docs to be yes by default (passing it explicitly doesn't help). So I do the following with the backtrace:

Consider your compiled program is called a.out and you get a line in the host backtrace like Host Frame:./nmt [0x530a]. Then open your program in cuda-gdb with:

cuda-gdb a.out

Then, let your program load all the shared libraries (at least up to a point in main() function). Enter the following in cuda-gdb prompt:

b main
r

Then, look up the function name with:

info symbol 0x530a

Or look up the line number with:

info line *0x530a

Where 0x530a is the address cuda-memcheck printed for you. I guess NVIDIA could automate this easily (as well as demangling the host function names where they are printed).

Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158