CUDA function pointers

Question

I was trying to make somtehing like this (actually I need to write some integration functions) in CUDA

#include <iostream>
using namespace std;

float f1(float x) {
    return x * x;
}

float f2(float x) {
    return x;
}

void tabulate(float p_f(float)) {
    for (int i = 0; i != 10; ++i) {
        std::cout << p_f(i) << ' ';
    }
    std::cout << std::endl;
}

int main() {
    tabulate(f1);
    tabulate(f2);
    return 0;
}

output:

0 1 4 9 16 25 36 49 64 81
0 1 2 3 4 5 6 7 8 9

I tried the following but only got the error

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) {
    return x;
}

__global__ void tabulate(float lower, float upper, float p_function(float), float* result) {
    for (lower; lower < upper; lower++) {
        *result = *result + p_function(lower);
    }
}

int main() {
    float res;
    float* dev_res;

    cudaMalloc( (void**)&dev_res, sizeof(float) ) ;

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost);

    printf("%f\n", res);
    /************************************************************************/
    scanf("%s");

    return 0;
}

What card are you using? You appear to be compiling your code as compute capability 1.x, and I think function pointers are a compute capability 2.x feature. You can change your nvcc invocation to have -gencode arch=compute_20,code=sm_20 (if your card supports it) — alrikai, Mar 26 '13 at 18:30
Then you should change your compilation from 1.x to 2.x, which will get rid of your compilation error. However you may still have some runtime problems... — alrikai, Mar 26 '13 at 20:00
@alrikai oh, okay, but is there possible way to make something like this in 1.x? — DanilGholtsman, Mar 26 '13 at 20:30
I don't think so, it appears you need to have a function pointer to a device function, and as per the CUDA Programming Guide: "Function pointers to device functions are only supported in device code compiled for devices of compute capability 2.x and higher." Your 560Ti is compute capability 2.1 though, so it's doable for you if you change to -gencode arch=compute_20,code=sm_20 for compiling — alrikai, Mar 26 '13 at 20:37
@DanilGholtsman: Not using function pointers. But you could use template arguments and a switch-dispatch table: `enum F{f1, f2, …}; template __global__ g(){ switch(f){ case f1: func1(); break; case f2: func2(); break; … }}` and then `void h(F f){ switch(f){ case f1: g<<<…>>>(); break; case f2: g<<<…>>>(); break; … } }`. Beware of combinatorial explosions though; stack this deep enough, and you might end up with hundreds of megabytes of PTX code. BT;DT. — datenwolf, Dec 01 '21 at 19:58

score 11 · Accepted Answer · answered Mar 26 '13 at 20:29

To get rid of your compile error, you'll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you'll likely have some runtime problems:

Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

so you can have something like this (adapted from the "FunctionPointers" sample):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);

//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
   ...
}

//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;

//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

You can then pass the h_pointFunction as a parameter to your kernel, which can use it to call your __device__ function.

//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
    unsigned char tmp;
    ...
    tmp = (*pPointOperation)(tmp, 150.0)
    ...
}

//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);

Hopefully that made some sense. In all, it looks like you would have to change your f1 function to be a __device__ function and follow a similar procedure (the typedefs aren't necessary, but they do make the code nicer) to get it as a valid function pointer on the host-side to pass to your kernel. I'd also advise giving the FunctionPointers CUDA sample a look over

In addition to the above answer (+1) you may be interested in the very simple example of how to use function pointers in device code (not using templates, though) in this thread in the NVIDIA forums: https://devtalk.nvidia.com/default/topic/457094/how-can-i-use-__device__-function-pointer-in-cuda-/ — njuffa, Mar 27 '13 at 00:32
@njuffa In alrikai's answer, the device function pointer is directly accessible in kernel. What is the point of creating host function pointer, copying from symbol then passing that as kernel argument? — zindarod, Nov 05 '17 at 20:18
@zindarod No idea what you mean. In the example code I pointed to in my post dated 2013/5/27 above the function pointers are on the *device*: `__device__ op_func func[3] = { add_func, mul_func, div_func };` — njuffa, Nov 05 '17 at 20:27
@njuffa Yes, your example makes perfect sense. But if you look at the answer above, what is the point of `h_pointFunction` when `pComputeThreshold` can be called in kernel directly? — zindarod, Nov 05 '17 at 20:33
@zindarod I am not sure why you re asking me to explain an answer provided by a third party. Why not direct your question(s) at the answerer? — njuffa, Nov 05 '17 at 20:35
@njuffa Because your answer deviated from this, I thought you knew why to avoid it. Anyway I'll ask him. Thanks. — zindarod, Nov 05 '17 at 20:38
@zindarod I provided a comment, not an answer; mostly to point to what I thought was the *simplest* way to use function pointers with device functions. — njuffa, Nov 05 '17 at 20:41

score 1 · Answer 2 · answered Mar 26 '13 at 18:40

1

Even though you may be able to compile this code (see @Robert Crovella's answer) this code will not work. You cannot pass function pointers from host code as the host compiler has no way of figuring out the function address.

answered Mar 26 '13 at 18:40

Eugene

9,242
2
30
29

jakob · Answer 3 · 2021-12-01T19:46:33.047

Here is a simple class for function pointers that are callable from within a kernel I wrote based on this question:

template <typename T>
struct cudaCallableFunctionPointer
{
public:
  cudaCallableFunctionPointer(T* f_)
  {
    T* host_ptr = (T*)malloc(sizeof(T));
    cudaMalloc((void**)&ptr, sizeof(T));

    cudaMemcpyFromSymbol(host_ptr, *f_, sizeof(T));
    cudaMemcpy(ptr, host_ptr, sizeof(T), cudaMemcpyHostToDevice);
    
    cudaFree(host_ptr)
  }

  ~cudaCallableFunctionPointer()
  {
    cudaFree(ptr);
  }

  T* ptr;
};

you could use it like this:

__device__ double func1(double x)
{
    return x + 1.0f;
}

typedef double (*func)(double x);
__device__ func f_ = func1;



__global__ void test_kernel(func* f)
{
    double x = (*f)(2.0);
    printf("%g\n", x);
}



int main()
{
    cudaCallableFunctionPointer<func> f(&f_);

    test_kernel << < 1, 1 >> > (f.ptr);
}

output:

Isn't there a memory leak on `host_ptr`. You never called `free`. Why use malloc instead of just placing the object on the stack? — Russell Trahan, Aug 17 '21 at 01:05

CUDA function pointers

3 Answers3

Linked

Related