2

I am facing a problem while executing Cuda Visual profiler.The profiler is not able to complete the execution and i am getting the following error

Program run #18 completed. Error : Application : "/home/cnode0/workspace/cuda/pred/pre". Profiler data file '/home/cnode0/workspace/cuda/pred/temp_compute_profiler_0_0.csv' for application run 0 not found.

I got the following warning during the execution

The selected counter(s) "gld instructions 8bit", "gld instructions 16bit", "gld instructions 32bit", "gld instructions 64bit", "gld instructions 128bit", "gst instructions 8bit", "gst instructions 16bit", "gst instructions 32bit", "gst instructions 64bit", "gst instructions 128bit" can cause GPU kernels to run longer than the driver's watchdog timeout limit. In this case the driver will terminate the GPU kernel resulting in an application error and the profiling data will not be available. Setting the X Config option 'Interactive' to false is recommended when these counters are selected.

I have already see a proposed solution in this forum.

CUDA Visual Profiler 'Interactive' X config option?

As mentioned that post, I changed my xorg.conf to set the interactive flag to false as follows and restarted the system

Section "Device" 
Identifier "Device0" 
Driver         "nvidia"
VendorName     "NVIDIA Corporation" 
Option "Interactive" "0" 
EndSection

But this doesn't solve the problem.I am still getting the same warning again. I am running Ubuntu 10.04 LTS and nvidia Geforce GT430 with driver 285.05.09 Any one has some clue on this?

Community
  • 1
  • 1
Don K
  • 21
  • 2
  • It's not really a solution, but you can test if profiling is working at all if you set the CUDA_PROFILE environment variable to 1. You can import the resulting CSV file into Visual Profiler to further inspection. – tbalazs Dec 06 '11 at 07:58

2 Answers2

2

Another option is to reduce the number of HW counters that getting collected and see if that helps on your current install.

BTW what is the CUDA toolkit version - are you using the CUDA 4.1 RC1 with driver 285.05.09? If you are registered developer can you also confirm if you have the same issue with the current CUDA RC2 release?

You can also send a repro test app to cudatools@nvidia.com

  • CUDA 4.1 RC2 is publicly released -- you don't need to be a registered developer to download it. http://developer.nvidia.com/cuda-toolkit-41 BTW, your comment isn't really an answer, should probably be in the comments stream (for future ref)... – harrism Dec 07 '11 at 05:54
2

Based on the error message you are getting - there is no profiler output generated for the first application run itself. In the first run there are no profiler counters enabled and so this issue is not related to the "gld instructions* or "gst instructions*". The profiler output can be empty if there are explicit synchronization calls before the application terminates. You can try adding a cudaDeviceSynchronize(), cudaStreamSynchronize(), or cudaEventSynchronize() call before application termination. You can confirm if this is the issue by running the application from the command line and checking if the command line profiler output has some data.

> export COMPUTE_PROFILE=1
> <application>

Check the profiler output file "cuda_profile_0.log".

The output will be something like this (without the line numbers at start of each line):

1 # CUDA_PROFILE_LOG_VERSION 2.0 
2 # CUDA_DEVICE 0 Tesla C2075 
3 # CUDA_CONTEXT 1 
4 # TIMESTAMPFACTOR fffff6de60e24570 
5 method,gputime,cputime,occupancy 
6 method=[ memcpyHtoD ] gputime=[ 80.640 ] cputime=[ 278.000 ] 
7 method=[ memcpyHtoD ] gputime=[ 79.552 ] cputime=[ 237.000 ] 
8 method=[ _Z6VecAddPKfS0_Pfi ] gputime=[ 5.760 ] cputime=[ 18.000 ] occupancy=[ 1.000 ] 
9 method=[ memcpyDtoH ] gputime=[ 97.472 ] cputime=[ 647.000 ]

You need to check if there are any methods output in the profiler log. In the above example line 5 is the header row and there are 4 methods on lines 6 to 9.

Also note that the Visual Profiler warning message 'The selected counter(s) "gld instructions 8bit", "gld instructions 16bit" ...' is expected even after setting the interactive flag to false. This message is displayed each time when the "gld instructions* or "gst instructions*" counters are selected and by default these counters are selected.