I have several device functions defined before the main() and I would like to fill an array in global memory with these functions so that any kernel could access them. I know I could directly declare a statical device array[] and fill it with the various functions but this is not what I want because it would mean it is hardcoded and I want to make it as simple as possible for the user. So I prefer to avoid hardcoding and I would like to leave the user the possibility to add the functions he/she needs from the host. Therefore, I have been playing with function pointers in device and host, and this is where I got stuck. Here is an example of what I would like to do but that does not work:
typedef void (*ptrfunction)(int &, int &);
// Example of possible device functions:
__device__ void add(int &type1, int &type2){
// Do something
}
__device__ void multiply(int &type1, int &type2){
// Do something
}
__device__ void divide(int &type1, int &type2){
// Do something
}
// Array to be filled with these functions
__device__ ptrfunction listfunctions[10];
// Function pointers in the Device
// IMPORTANT: The following function pointer is not pointing to anything at this moment
__device__ ptrfunction devfunc;
// Function pointer that do point to a function:
__device__ ptrfunction devptr = multiply;
int main(){
ptrfunction hostptr;
// What I would like: A function that allows the user to add the different device functions he wants here.
// Something like:
AddFunction(add);
AddFunction(multiply);
....
AddFunction should somehow fill the array listfunctions declared in the device with add, multiply, etc... The question here is how to access these device functions from the host and pass them to the Device ? The first thing I tried was to directly copy add or multiply device functions to the function pointer devfunc using cudaMemcpyFromSymbol as follows, thinking that then I could directly use devfunc from any kernel:
cudaMemcpyToSymbol(devfunc, multiply, sizeof(pf_interacts), 0, cudaMemcpyHostToDevice);
It does not compile: error: no instance of overloaded function "cudaMemcpyToSymbol" matches the argument list
I tried cudaMemcpyToSymbol with int declared in the device, and it works fine. But when it comes to function pointers, there is something that does not work.
An example that works but that I want to avoid is:
cudaMemcpyFromSymbol(&hostptr, devptr, sizeof(ptrfunction));
kernel<<<...,...>>>(hostptr);
cudaDeviceSynchronize();
This example works well. The function pointer devptr declared in the device, and that points to the function multiply, is copied to a function pointer in the host. Then, it can be passed as argument to a kernel. The drawback of this, as I explained before, is that I must hardcode before the main the function at which the pointer is pointing to, which I want to avoid. Using the previous example that works I tried to directly copy a device function to the function pointer in the host, instead of copying the device function pointer. Here I try to copy the device function multiply to the host function pointer:
cudaMemcpyFromSymbol(&hostptr, multiply, sizeof(ptrfunction));
kernel<<<...,...>>>(hostptr);
cudaDeviceSynchronize();
It compiles without complaining. But when I launch it with cuda-mem that is what I get:
Program hit cudaErrorInvalidSymbol (error 13) due to "invalid device symbol" on CUDA API call to cudaMemcpyFromSymbol. ... Hardware Stack Overflow ========= at 0x00022ca8 Device Frame:kernel......
Copying a device function pointer to a host function pointer works, but when I try to copy directly a function, it does not work. Pitty.
The question thus is, how to access, from the host, to the address of a device function (add, multiply, ...) to pass it either as argument of a kernel or using an API to copy it to a device function pointer ?