Cuda function pointer consistency

Question

I recently tried to use function pointer to dynamically define several processing stage in my application, running on a sm_30.

It would be difficult to post the code here, as there are many differents files and functions implicated, but basically, I started from the sample that was included in the Cuda Toolkit 5.0.

I allocate a device function buffer where I copy a device function pointer, defined just as in the sample thanks to cudaMemcpyfromsymbolAsync used with DeviceToDevice copy Kind.

My device pointer is defined like this in a .cu.h :

//device function pointer model
typedef void (*func)(structGpuArgument*);

//Declaring a function
__device__ void gpuFunc1(structGpuArgument* arg1);

elsewhere I have a .cu that include the previous declaration that contains the following code:

//get the actual function pointer
__device__ func gpuFuncPtr = gpuFunc1;

//Buffer to store a list of function pointer
func* pFuncDevBuffer;
cudaMalloc(&pFuncDevBuffer,NB_FUNC*sizeof(func));

//copy the actual function pointer (symbol) to the list buffer 
cudaMemcpyFromSymbolAsync( pFuncDevBuffer+i ,gpuFuncPtr,sizeof(func),0,cudaMemcpyDeviceToDevice,stream)

//Launch the kernel that will use the functions
kernel_test<<<1,10,0,stream>>>(pFuncDevBuffer)
...

//defining the kernel that uses pointer buffer
__global__ void kernel_test(func* pFuncDevBuffer)
{
   printf("func address : %p\n",pFuncDevBuffer[0]);
   pFuncDevBuffer[0](NULL);
}

//defining the function pointed by the function pointer
__device__ void gpuFunc1(structGpuArgument* arg1)
{
     do_something;
}

In fact, everything works fine as long as the global kernel that takes the device function buffer in argument is defined in the same file that the function and its pointer. The kernel then can print out the address of the function (0x4) and execute its code without problem I don't use the separate compilation.

When, in the same instance of the program a second kernel, defined elsewhere takes the very same function pointer buffer in argument, it can print out the very same memory address for the function pointer (0x4) but if it tries to execute it, it fails issuing an illegal Instruction at 0x00000000 in cuda-memcheck. Any other cuda API call freezes after, I need to reboot my computer (reset through cuda-smi isn't supported on my gpu).

I would like to know if there is a known issue in using function pointer this way, ie by using a function pointer buffer defined in an other file, but sharing the same function pointer definition.

Also if there is a workout for reseting a device after a segfault without rebooting the whole system, it could help me to save time while debugging my application.

Thank you for your help

You're linking device code in 2 separate files together.You need to use the device linker — Robert Crovella, May 07 '13 at 11:22
@RobertCrovella Thank you for your answer, but could you be more specific ? Because I have no linking error for now, and I am using CUDA.Cmake to build my project, which hides me part of the compiling/linking process. As far as I know, there is no process bounded memory in nvidia GPU, so why wouldn't my kernel be able to access and load code from the address it read from the buffer (0x4) ? — Tobbey, May 07 '13 at 11:59
After getting the problem reproduced on a simple example, it seems that you are right. In nsight, I generated a "fail" version compiled in whole program mode and a "successfull" one without re-coding anything but setting the "separate compilation" option. My problem is now that firstly I don't understand the nvcc documentation, what exactly relocatable device code and separable compilation stand for, and what it has to do with my problem. On the other side, while trying to use separate compilation and relocatable device code on cuda.cmake I get tons of undefined reference linking errors. — Tobbey, May 07 '13 at 13:42
Sorry I was on a plane. Seems like you are on track to figuring it out. If you have device code in 2 files that need to call or reference each other, you need to use the device linker (which is used when you select separate compilation). You may want to read that section of the nvcc manual at docs.nvidia.com Sorry I can't help with cmake. — Robert Crovella, May 07 '13 at 17:49

Cuda function pointer consistency

0 Answers0

Linked