3

I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110.

I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds cudadevrt.lib (at least it tried to according to the linker invocation). When I do a cuModuleLoad on the ptx .obj it says unsupported device code. There is a also a .device-link.obj which seems unrealistically small and non of the driver api functions seem to recognize it as a valid image. When inspecting the ptx file I can see that it generated a call to the kernel launch function according to the CUDA documentation (dynamic parallelism from PTX section).

How can I link the proper device code such that the dynamic kernel invocation works?

(this is CUDA 6.5 on Win64 with VC2013)

talonmies
  • 70,661
  • 34
  • 192
  • 269
FHoenig
  • 349
  • 1
  • 10

1 Answers1

5

You need to do the linking while loading the ptx-file using cuda linker provided by the driver API:

  • Compile the cu-source file with relocatable flag to ptx

In your app:

  • Create a linker instance with cuLinkCreate()
  • Append the ptx-file using cuLinkAddFile() or cuLinkAddData()
  • Append cudadevrt.lib using cuLinkAddFile() or cuLinkAddData()
  • Call cuLinkComplete() which returns you the binary you can then load as usual (e.g. cuModuleLoadDataEx())
  • Destroy the linker instance with cuLinkDestroy()
kunzmi
  • 1,024
  • 1
  • 6
  • 8