I've thought about this problem for a while, and while I don't think this is a "great" solution, it does seem to work so I thought I would share it.
The basic idea is to use linux to spawn processes to compile and then run the compiled code. I think this is pretty much a no-brainer, but since I put together the pieces, I'll post instructions here in case it's useful for somebody else.
The problem statement in the question is to be able to take a file that contains a user-defined function, let's assume it is a function of a single variable f(x)
, i.e. y = f(x)
, and that x and y can be represented by float
quantities.
The user would edit a file called fx.txt
that contains the desired function. This file must conform to C syntax rules.
fx.txt:
y=1/x
This file then gets included in the __device__
function that will be holding it:
user_testfunc.cuh:
__device__ float fx(float x){
float y;
#include "fx.txt"
;
return y;
}
which gets included in the kernel that is called via a wrapper.
cudalib.cu:
#include <math.h>
#include "cudalib.h"
#include "user_testfunc.cuh"
__global__ void my_kernel(float x, float *y){
*y = fx(x);
}
float cudalib_compute_fx(float x){
float *d, *h_d;
h_d = (float *)malloc(sizeof(float));
cudaMalloc(&d, sizeof(float));
my_kernel<<<1,1>>>(x, d);
cudaMemcpy(h_d, d, sizeof(float), cudaMemcpyDeviceToHost);
return *h_d;
}
cudalib.h:
float cudalib_compute_fx(float x);
The above files get built into a shared library:
nvcc -arch=sm_20 -Xcompiler -fPIC -shared cudalib.cu -o libmycudalib.so
We need a main application to use this shared library.
t452.cu:
#include <stdio.h>
#include <stdlib.h>
#include "cudalib.h"
int main(int argc, char* argv[]){
if (argc == 1){
// recompile lib, and spawn new process
int retval = system("nvcc -arch=sm_20 -Xcompiler -fPIC -shared cudalib.cu -o libmycudalib.so");
char scmd[128];
sprintf(scmd, "%s skip", argv[0]);
retval = system(scmd);}
else { // compute f(x) at x = 2.0
printf("Result is: %f\n", cudalib_compute_fx(2.0));
}
return 0;
}
Which is compiled like this:
nvcc -arch=sm_20 -o t452 t452.cu -L. -lmycudalib
At this point, the main application (t452
) can be executed and it will produce the result of f(2.0) which is 0.5 in this case:
$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t452
Result is: 0.500000
The user can then modify the fx.txt
file:
$ vi fx.txt
$ cat fx.txt
y = 5/x
And just re-run the app, and the new functional behavior is used:
$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t452
Result is: 2.500000
This method takes advantage of the fact that upon recompilation/replacement of a shared library, a new linux process will pick up the new shared library. Also note that I've omitted several kinds of error checking for clarity. At a minimum I would check CUDA errors, and I would also probably delete the shared object (.so) library before recompiling it, and then test for its existence after compilation, to do a basic test that the compilation proceeded successfully.
This method entirely uses the runtime API to achieve this goal, so as a result the user would have to have the CUDA toolkit installed on their machine and appropriately set up so that nvcc
is available in the PATH. Using the driver API with PTX code would make this process much cleaner (and not require the toolkit on the user's machine), but AFAIK there is no way to generate PTX from CUDA C without using nvcc
or a user-created toolchain built on the nvidia llvm compiler tools. In the future, there may be a more "integrated" approach available in the "standard" CUDA C toolchain, or perhaps even by the driver.
A similar approach can be arranged using separate compilation and linking of device code, such that the only source code that needs to be exposed to the user is in user_testfunc.cu
(and fx.txt
).
EDIT: There is now a CUDA runtime compilation facility, which should be used in place of the above.