CUDA dynamic parallelism with Driver API

Question

I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110.

I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds cudadevrt.lib (at least it tried to according to the linker invocation). When I do a cuModuleLoad on the ptx .obj it says unsupported device code. There is a also a .device-link.obj which seems unrealistically small and non of the driver api functions seem to recognize it as a valid image. When inspecting the ptx file I can see that it generated a call to the kernel launch function according to the CUDA documentation (dynamic parallelism from PTX section).

How can I link the proper device code such that the dynamic kernel invocation works?

(this is CUDA 6.5 on Win64 with VC2013)

score 5 · Accepted Answer · answered Jan 08 '15 at 00:52

5

You need to do the linking while loading the ptx-file using cuda linker provided by the driver API:

Compile the cu-source file with relocatable flag to ptx

In your app:

Create a linker instance with cuLinkCreate()
Append the ptx-file using cuLinkAddFile() or cuLinkAddData()
Append cudadevrt.lib using cuLinkAddFile() or cuLinkAddData()
Call cuLinkComplete() which returns you the binary you can then load as usual (e.g. cuModuleLoadDataEx())
Destroy the linker instance with cuLinkDestroy()

answered Jan 08 '15 at 00:52

kunzmi

1,024
1
6
8

Thanks! That did work. Still wondering how to make a pre-linked thing though. – FHoenig Jan 08 '15 at 02:20
I've provided an answer [here](https://stackoverflow.com/a/69144256/1695960) which shows how to make the "pre-linked thing". – Robert Crovella Sep 11 '21 at 15:15

CUDA dynamic parallelism with Driver API

1 Answers1

Linked