Use cmake to configure cuda project for vs2013 and get "invalid device function" error

Question

I use the cmake gui tool to configure my cuda project in vs2013. CMakeLists.txt is as below:

project(CUDA_PART)
# required cmake version
cmake_minimum_required(VERSION 3.0)

include_directories(${CUDA_PART_SOURCE_DIR}/common)
# packages
find_package(CUDA REQUIRED)

# nvcc flags
set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)
set(CUDA_VERBOSE_BUILD ON)

#FILE(GLOB SOURCES "*.cu" "*.cpp" "*.c" "*.h")
CUDA_ADD_EXECUTABLE(CUDA_PART hist_gpu_shmem_atomics.cu)

The .cu file is from Cuda by example source code hist_gpu_shmem_atomics.cu

There are two problems:

After the line histo_kernel <<<blocks * 2, 256 >>>(dev_buffer, SIZE, dev_histo);an "invalid device function" error occurs.
When I use the CUDA debugging tool to debug, its cannot trigger breakpoints in the device code.

But when I create a project with the same code by the cuda project temple in visual studio 2013.It works correctly!

So, is there something wrong in the CMakeLists.txt ?

OS: Win7 64bit;GPU: GTX960;CUDA: CUDA 7.5;VS: 2013 (and also 2010)

When I use set the "Code Generation" in vs2013 as follow :

The CUDA_NVCC_FLAGES turns out to be -gencode=arch=compute_20,code=\"sm_20,compute_20\" It equals to:

-gencode=arch=compute_20,code=sm_20 \
-gencode=arch=compute_20,code=compute_20

So, I guess it will generate 2 versions machine code: the first one(SASS) with virtual and real architectures and the second one(PTX) with only virtual architecture. Since my GTX960 is a cc5.2 device, it chooses the second one (PTX) and convert it to a suitable SASS.

which `nvcc` flags are used when you compile using the CUDA project template in VS2013? — m.s., Oct 14 '15 at 13:34

score 1 · Accepted Answer · edited May 23 '17 at 10:27

1

This is a problem:

set(CUDA_NVCC_FLAGS -gencode arch=compute_20,code=sm_20;-G;-g)

Those flags will cause nvcc to generate SASS code (only) for a cc 2.0 device (only). Such cc2.0 SASS code will not run on your cc5.2 device (GTX960). "Invalid device function" is exactly the error you would get when trying to launch a kernel in such a scenario. Since the kernel will never launch, trying to hit breakpoints in device code won't work.

I'm not a CMake expert, so there might be other, more sensible approaches, but one possible way to try to fix this might be:

set(CUDA_NVCC_FLAGS -gencode arch=compute_52,code=sm_52;-G;-g)

which should generate code for your cc5.2 device. There are undoubtedly other possible settings here, you may want to read this or the nvcc manual for more background on compile options to target specific devices.

Also note that -G generates device debug code, which is fine if that is what you want. However it will generally run slower than code compiled without that switch. If you want to debug, however, that switch is necessary.

edited May 23 '17 at 10:27

Community

1
1

answered Oct 14 '15 at 14:16

Robert Crovella

143,785
11
213
257

Thank you `arch=compute_52,code=sm_52` actually works for me. – Pan.da Oct 15 '15 at 02:57
And I test `arch=compute_20,code=\"sm_20,compute_20\"`. It works fine,too. But why? Can you make a further explain on this ? – Pan.da Oct 15 '15 at 03:04
Yes, read the answer I linked. That particular combination includes *both* PTX and SASS, and the PTX can be forward JIT-compiled (at runtime, by the driver) from the sm_20 to sm_52 architecture to match your GPU. – Robert Crovella Oct 15 '15 at 03:36
Again, this is a perfect case where proper [error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) would save a lot of troubles. – Ivan Aksamentov - Drop Oct 15 '15 at 11:04

Use cmake to configure cuda project for vs2013 and get "invalid device function" error

1 Answers1

Linked

Related