0

I run code using nvcc command and it gave correct output but when I run the same code on the nsight eclipse it gave wrong output. Any one have any idea why is this behavior.


Finally I found there is problem in one of the array allocation.While the command line doesn't make any problem the nsight does.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
AhmadO
  • 9
  • 3
  • When in nsight eclipse, were you running the release or debug version of the project? Did you compare the build command line from within eclipse to that when you built directly with nvcc? (nsight eclipse also builds with nvcc) Are there any differences in the command line parameters you specified when running the executable? – Robert Crovella Jan 06 '13 at 06:17
  • I am using double precision in my code. When I use the nvcc command I use nvcc -arch=sm_20 C_ARK4d063.cu. How can I use -arch=sm_20 in the nsight eclipse? Thanks – AhmadO Jan 06 '13 at 06:52
  • I tried the release and the debug version but the answer the same and wrong. The wrong out put happened when I switch from using sheared memory in the code to registers.But it still running on the nvcc command but not on the nsight. Thanks – AhmadO Jan 06 '13 at 07:20
  • I found the following:(1)The code of sheared memory runs fine using both nsight and nvcc command.(2)If I changed the block size and the grid size then the output becomes wrong on the nsight and correct on the nvcc. (3)I tried different block size and grid size for my question above which is (code with registers), the code run fine on the nvcc but it is still wrong on the nsight what ever you change of these sizes. Do think the shared memory and register usage in the nsight different from that on the command? or any hint?Thanks – AhmadO Jan 06 '13 at 08:14
  • In the nsight project properties, you can specify the type of device you want to compile for. If you are using double precision on a CC2.0 device, be sure to select that type of device. It will then add the necessary switches to the compile command line. To access the project properties, use the project menu and then select properties. Then go to the build menu item and click the triangle to open it up, then click on the CUDA sub-menu item. Make sure to check 2.0 or greater (depending on your GPU) for "Generate PTX code" and "Generate GPU code". This may fix all your issues. – Robert Crovella Jan 06 '13 at 14:14
  • CC 2.0 devices also can handle larger block sizes (threads per block), so that switch may be affecting that behavior as well. My guess is you are also not doing proper [CUDA error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). – Robert Crovella Jan 06 '13 at 14:19
  • I already tried your suggestions but still doesn't work!! .I think the program has a problem in the synchronization.The problem that when I run the code on the nsight it behaves differently from the nvcc or in other words, may be the order of the execution in the nsight different from the nvcc so that I got wrong answer.I'll try to fix the problem and answer my question later. Thanks very much for your comment it was very useful to think in this way. – AhmadO Jan 07 '13 at 02:28

1 Answers1

1

Nsight EE builds the projects by generating make files based on the project settings and by invoking the OS make utility to build the project. It is using nvcc compiler found in the PATH but it relies on some newer options introduced in NVCC compiler 5.0 (that is a part of the same toolkit distribution).

Please do a clean rebuild in Nsight Eclipse - it will print out the command lines used to build your application. Then you can compare that command line with the one you use outside. Possible differences are:

  1. Nsight specifies debug flags and optimization flag when building in debug and release modes.
  2. By default, Nsight sets the new project to build for the hardware detected on your system. NVCC default is SM 1.0.
  3. Make sure the compiler used by Nsight and from the command line are one and the same. It is possible that you have different compilers (e.g. 4.x and 5.0) installed on your system that may generate a slightly different code.

In any case, it is likely your code has some bug that manifests itself under different compilation settings. I would recommend running CUDA memcheck on you program to ensure there is no hidden bugs.

Eugene
  • 9,242
  • 2
  • 30
  • 29