cuda runtime api and dynamic kernel definition

Question

Using the driver api precludes the usage of the runtime api in the same application ([1]) . Unfortunately cublas, cufft, etc are all based on the runtime api. If one wants dynamic kernel definition as in cuModuleLoad and cublas at the same time, what are the options? I have these in mind, but maybe there are more:

A. Wait for compute capability 3.5 that's rumored to support peaceful coexistence of driver and runtime apis in the same application.

B. Compile the kernels to an .so file and dlopen it. Do they get unloaded on dlcose?

C. Attempt to use cuModuleLoad from the driver api, but everything else from the runtime api. No idea if there is any hope for this.

I'm not holding my breath, because jcuda or pycuda are in pretty much the same bind and they probably would have figured it out already.

[1] CUDA Driver API vs. CUDA runtime

Compute capability 3.5 devices are off-the-shelf since january 2013. — Vitality, Dec 12 '13 at 09:26
I haven't been able to verify that cm 3.5 indeed solves this issue and those devices are not exactly widespread yet. — melisgl, Dec 12 '13 at 10:07
And runtime API - driver API interoperability was solved around the time CUDA 3.0 was released (ie. about 4 years ago). It does/has worked on *all* CUDA compatible hardware since 2009. — talonmies, Dec 12 '13 at 10:47
Are you saying that with cuda 3.0+ all cuda compatible hardware works or that with cuda 3.0+ all cuda compatible hardware released after 2009 works? — melisgl, Dec 12 '13 at 11:25
All hardware. The solution to interoperability is purely a driver level change in the way context management works. It has nothing to do with the GPU hardware itself. You entire question seems to be premised on false assumptions, leading to you trying to solve a non-existent problem..... — talonmies, Dec 12 '13 at 11:39
I'll be happy if that's true. The information comes from: http://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions#Are_the_CUBLAS_APIs_available_via_PyCUDA.3F and http://lists.tiker.net/pipermail/pycuda/2012-August/004065.html — melisgl, Dec 12 '13 at 12:03
@melisgl: You need to read more carefully. That is a question about the CUBLAS *device* API. Compute 3.5 devices support a CUBLAS API *inside kernels*, which requires separate compilation and linking of device libraries, which is something PyCUDA didn't (perhaps still doesn't support). Nothing to do with CUDA runtime and driver API interoperability. — talonmies, Dec 12 '13 at 12:36
And you can read how the context interoperability between driver and runtime APIs [here](http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html#group__CUDART__DRIVER) in the CUDA documentation. If that is a bit dry for you, try [this Dr Dobbs article](http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/224400246) from *2010*..... — talonmies, Dec 12 '13 at 13:43

talonmies · Accepted Answer · 2013-12-13T10:22:37.523

To summarize, you are tilting at windmills here. By relying on extremely out of date information, you seem to have concluded that runtime and driver API interoperability isn't supported in CUDA, when, in fact, it has been since the CUDA 3.0 beta was released in 2009. Quoting from the release notes of that version:

The CUDA Toolkit 3.0 Beta is now available.

Highlights for this release include:

CUDA Driver / Runtime Buffer Interoperability, which allows applications using the CUDA Driver API to also use libraries implemented using the CUDA C Runtime.

There is documentation here which succinctly describes how the driver and runtime API interact.

To concretely answer your main question:

If one wants dynamic kernel definition as in cuModuleLoad and cublas at the same time, what are the options?

The basic approach goes something like this:

Use the driver API to establish a context on the device as you would normally do.
Call the runtime API routine cudaSetDevice(). The runtime API will automagically bind to the existing driver API context. Note that device enumeration is identical and common between both APIs, so if you establish context on a given device number in the driver API, the same number will select the same GPU in the driver API
You are now free to use any CUDA runtime API call or any library built on the CUDA runtime API. Behaviour is the same as if you relied on runtime API "lazy" context establishment

Great answer, talonmies. Thank you. – melisgl Dec 12 '13 at 21:15 — melisgl, Dec 12 '13 at 21:15

cuda runtime api and dynamic kernel definition

1 Answers1