Compiling code containing dynamic parallelism fails

Question

I am doing dynamic parallelism programming using CUDA 5.5 and an NVDIA GeForce GTX 780 whose compute capability is 3.5. I am calling a kernel function inside a kernel function but it is giving me an error:

error : calling a __global__ function("kernel_6") from a __global__ function("kernel_5") is only allowed on the compute_35 architecture or above

What am I doing wrong?

score 12 · Answer 1 · answered Oct 10 '13 at 08:11

12

You can do something like this

nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt

or

If you have 2 files simple1.cu and test.c then you can do something as below. This is called seperate compilation.

nvcc -arch=sm_35 -dc simple1.cu 
nvcc -arch=sm_35 -dlink simple1.o -o link.o -lcudadevrt
g++ -c test.c 
g++ link.o simple1.o test.o -o simple -L/usr/local/cuda/lib64/ -lcudart

The same is explained in the cuda programming guide

answered Oct 10 '13 at 08:11

Sagar Masuti

1,271
2
11
30

Hi thanks for reply I am doing that but getting this error fatal error : nvcc supports '--relocatable-device-code=true (-rdc=true)', '--device-c (-dc)', and '--device-link (-dlink)' only when targeting sm_20 or higher – user2865500 Oct 10 '13 at 08:37
Could you please attach what exactly you are doing ? simple program and the command you are running ? Because I just tried the above command for a simple program [here](http://pastebin.com/3Z2aGa4F) – Sagar Masuti Oct 10 '13 at 08:53
Hi sir i am doing the same this ..calling a kernel function inside another kernel function .I have 2 files named sample.cpp and cuda.cu. Please elaborate where should i add this nvcc -arch=sm_35 -rdc=true simple1.cu -o simple1 -lcudadevrt – user2865500 Oct 10 '13 at 09:14
@user2865500 Are you using linux or windows? – Vitality Oct 10 '13 at 09:27
I am using windows 7 .visual studio 2010 CUDA 5.5 – user2865500 Oct 10 '13 at 09:30
@user2865500 Ok, now I understand. Have a look at my answer below. I hope it will solve your problem when using Visual Studio 2010. – Vitality Oct 10 '13 at 09:34
Sorry I dont use windows. Look at @JackOLantern's answer. Please be specific from next time onwards. So you can get the correct answer. – Sagar Masuti Oct 10 '13 at 09:37

score 7 · Answer 2 · answered Oct 10 '13 at 09:33

7

From Visual Studio 2010:

1) View -> Property Pages
2) Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true)
3) Configuration Properties -> CUDA C/C++ -> Device -> Code Generation -> compute_35,sm_35
4) Configuration Properties -> Linker -> Input -> Additional Dependencies -> cudadevrt.lib

answered Oct 10 '13 at 09:33

Vitality

20,705
4
108
146

I am very thankful for your help ..I have done this now i am getting this error.please donot be irretate... nvcc : fatal error : nvcc supports '--relocatable-device-code=true (-rdc=true)', '--device-c (-dc)', and '--device-link (-dlink)' only when targeting sm_20 or higher – user2865500 Oct 10 '13 at 09:36
Make sure you have correctly accomplished step 3. Are you able to successfully compile the cdpLUDecomposition CUDA example? It uses dynamic parallelism to calculate LU decomposition. – Vitality Oct 10 '13 at 09:54
Yes sir. i build cdpLUDeomposition CUDA example. It doesnot give any error. But Gives me this Native' has exited with code 0 (0x0). I have done all steps successfully. – user2865500 Oct 10 '13 at 09:59
@user2865500 I'm not sure what to suggest else. Are you re-building your solution or simply building it? – Vitality Oct 10 '13 at 11:08

kangshiyin · Answer 3 · 2013-10-10T08:23:14.083

4

You need to let nvcc generate CC 3.5 code for your device. This can be done by adding this option to nvcc command line.

 -gencode arch=compute_35,code=sm_35

You may find the CUDA samples on dynamic parallelism for more detail. They contain both command line options and project settings for all supported OS.

http://docs.nvidia.com/cuda/cuda-samples/index.html#simple-quicksort--cuda-dynamic-parallelism-

edited Oct 10 '13 at 08:23

answered Oct 10 '13 at 05:11

kangshiyin

9,681
1
17
29

7

To use dynamic parallelism `--relocatable-device-code=true` or short `-rdc` is also needed. Also to prevent further errors don'T forget to link against cudadevrt library. – Michael Haidl Oct 10 '13 at 05:46

Compiling code containing dynamic parallelism fails

3 Answers3

Linked

Related