6

Does CUDA support JIT compilation of a CUDA kernel?

I know that OpenCL offers this feature.

I have some variables which are not changed during runtime (i.e. only depend on the input file), therefore I would like to define these values with a macro at kernel compile time (i.e at runtime).

If I define these values manually at compile time my register usage drops from 53 to 46, what greatly improves performance.

tomix86
  • 1,336
  • 2
  • 18
  • 29
user1829358
  • 1,041
  • 2
  • 9
  • 19
  • 2
    cuda code [can be compiled](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compilation-nvcc) to an [intermediate format ptx code](http://docs.nvidia.com/cuda/parallel-thread-execution/index.html), which will then be [jit-compiled](http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#just-in-time-compilation) to the actual device architecture machine code at runtime. I'm not sure this will meet your needs however since I'm unsure exactly how your code will compile differently at runtime (i.e. what the macros will depend on.) – Robert Crovella Nov 26 '12 at 15:19
  • I read some scalar values from an input file and I'd like to define them at kernel compile time. e.g.: #define epsilon 3.0 – user1829358 Nov 26 '12 at 15:49
  • 1
    If you have few possible combinations of constants, you can use templates in CUDA to generate separate code for each combination. The compiler can then select the correct kernel for you at runtime. – Roger Dahl Nov 26 '12 at 20:36
  • ArrayFire does JIT compilation at runtime to optimize kernels for incoming data sizes and shapes (I work on ArrayFire so that's how I know). So yes it is possible to do in CUDA! – arrayfire Nov 26 '12 at 22:38
  • I believe the answer is "no." However, if you only want to change a few constants, you can use templates ([see this blog post](http://blog.icare3d.org/2010/04/cuda-template-metaprogramming.html)). They aren't nearly as powerful as being able to compile code at run-time. One of the major features I love in OpenCL. – Ryan Marcus May 23 '13 at 19:36
  • @accelereyes care to give info on how that's actually done? – Dmitri Nesteruk Sep 14 '13 at 21:31

2 Answers2

1

It became available with nvrtc library of cuda 7.0. By this library you can compile your cuda codes during runtime.

http://devblogs.nvidia.com/parallelforall/cuda-7-release-candidate-feature-overview/

Bu what kind of advantages you can gain? In my view, i couldn't find so much dramatic advantages of dynamic compilation.

grypp
  • 405
  • 2
  • 15
0

If it is feasible for you to use Python, you can use the excellent pycuda module to compile your kernels at runtime. Combined with a templating engine such as Mako, you will have a very powerful meta-programming environment that will allow you to dynamically tune your kernels for whatever architecture and specific device properties happen to be available to you (obviously some things will be difficult to make fully dynamic and automatic).

You could also consider just maintaining a few distinct versions of your kernel with different parameters, between which your program could choose at runtime based on whatever input you are feeding to it.

Brendan Wood
  • 6,220
  • 3
  • 30
  • 28
  • 1
    thank you for your thoughts. Using pycuda seems to be a little overkill to me. However, I might give it a chance if there is no other way. Is there no cuda driver call similar to OpenCL's clBuildProgram? – user1829358 Nov 26 '12 at 17:37
  • At least in vanilla CUDA, I am not aware of anything similar to `clBuildProgram`. CUDA does do runtime compilation of the device-independent PTX code if the GPU binary is not already available, but I don't know how you could leverage that for your use case. – Brendan Wood Nov 26 '12 at 17:41