1

I use the cmake gui tool to configure my cuda project in vs2013. CMakeLists.txt is as below:

project(CUDA_PART)
# required cmake version
cmake_minimum_required(VERSION 3.0)

include_directories(${CUDA_PART_SOURCE_DIR}/common)
# packages
find_package(CUDA REQUIRED)

# nvcc flags
set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)
set(CUDA_VERBOSE_BUILD ON)

#FILE(GLOB SOURCES "*.cu" "*.cpp" "*.c" "*.h")
CUDA_ADD_EXECUTABLE(CUDA_PART hist_gpu_shmem_atomics.cu)

The .cu file is from Cuda by example source code hist_gpu_shmem_atomics.cu

There are two problems:

  1. After the line histo_kernel <<<blocks * 2, 256 >>>(dev_buffer, SIZE, dev_histo);an "invalid device function" error occurs.

  2. When I use the CUDA debugging tool to debug, its cannot trigger breakpoints in the device code.

But when I create a project with the same code by the cuda project temple in visual studio 2013.It works correctly!

So, is there something wrong in the CMakeLists.txt ?

OS: Win7 64bit;GPU: GTX960;CUDA: CUDA 7.5;VS: 2013 (and also 2010)


When I use set the "Code Generation" in vs2013 as follow : the properties of a project use cuda project temple

The CUDA_NVCC_FLAGES turns out to be -gencode=arch=compute_20,code=\"sm_20,compute_20\" It equals to:

-gencode=arch=compute_20,code=sm_20 \
-gencode=arch=compute_20,code=compute_20 

So, I guess it will generate 2 versions machine code: the first one(SASS) with virtual and real architectures and the second one(PTX) with only virtual architecture. Since my GTX960 is a cc5.2 device, it chooses the second one (PTX) and convert it to a suitable SASS.

Pan.da
  • 41
  • 7

1 Answers1

1

This is a problem:

set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)

Those flags will cause nvcc to generate SASS code (only) for a cc 2.0 device (only). Such cc2.0 SASS code will not run on your cc5.2 device (GTX960). "Invalid device function" is exactly the error you would get when trying to launch a kernel in such a scenario. Since the kernel will never launch, trying to hit breakpoints in device code won't work.

I'm not a CMake expert, so there might be other, more sensible approaches, but one possible way to try to fix this might be:

set(CUDA_NVCC_FLAGS -gencode arch=compute_52,code=sm_52;-G;-g)

which should generate code for your cc5.2 device. There are undoubtedly other possible settings here, you may want to read this or the nvcc manual for more background on compile options to target specific devices.

Also note that -G generates device debug code, which is fine if that is what you want. However it will generally run slower than code compiled without that switch. If you want to debug, however, that switch is necessary.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thank you `arch=compute_52,code=sm_52` actually works for me. – Pan.da Oct 15 '15 at 02:57
  • And I test `arch=compute_20,code=\"sm_20,compute_20\"`. It works fine,too. But why? Can you make a further explain on this ? – Pan.da Oct 15 '15 at 03:04
  • Yes, read the answer I linked. That particular combination includes *both* PTX and SASS, and the PTX can be forward JIT-compiled (at runtime, by the driver) from the sm_20 to sm_52 architecture to match your GPU. – Robert Crovella Oct 15 '15 at 03:36
  • Again, this is a perfect case where proper [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) would save a lot of troubles. – Ivan Aksamentov - Drop Oct 15 '15 at 11:04