How to call a host function in a CUDA kernel?

Question

As the following error implies, calling a host function ('rand') is not allowed in kernel, and I wonder whether there is a solution for it if I do need to do that.

error: calling a host function("rand") from a __device__/__global__ function("xS_v1_cuda") is not allowed

score 33 · Accepted Answer · edited Mar 30 '12 at 04:22

33

Unfortunately you can not call functions in device that are not specified with __device__ modifier. If you need in random numbers in device code look at cuda random generator curand http://developer.nvidia.com/curand

If you have your own host function that you want to call from a kernel use both the __host__ and __device__ modifiers on it:

__host__ __device__ int add( int a, int b )
{
    return a + b;
}

When this file is compiled by the NVCC compiler driver, two versions of the functions are compiled: one callable by host code and another callable by device code. And this is why this function can now be called both by host and device code.

edited Mar 30 '12 at 04:22

Ashwin Nanjappa

76,204
83
211
292

answered Mar 30 '12 at 04:12

geek

1,809
1
12
12

2

How bad is the performance downgrade in calling a host function instead than using a cuda built-in one? – Mattia Apr 20 '16 at 15:04
Does this imply that the host and the device will only execute their own respective copies of the functions? – avgvstvs Nov 14 '16 at 16:16

score 7 · Answer 2 · edited May 23 '17 at 12:25

The short answer is that here is no solution to that issue.

Everything that normally runs on a CPU must be tailored for a CUDA environment without any guarantees that it is even possible to do. Host functions are just another name in CUDA for ordinary C functions. That is, functions running on a CPU-memory Von Neumann architecture like all C/C++ has been up to this point in PCs. GPUs give you tremendous amounts of computing power but the cost is that it is not nearly as flexible or compatible. Most importantly, the functions run without the ability to access main memory and the memory they can access is limited.

If what you are trying to get is a random number generator you are in luck considering that Nvidia went to the trouble of specifically implementing a highly efficient Mersenne Twister that can support up to 256 threads per SMP. It is callable inside a device function, described in an earlier post of mine here. If anyone finds a better link describing this functionality please remove mine and replace the appropriate text here along with the link.

One thing I am continually surprised by is how many programmers seem unaware of how standardized high quality pseudo-random number generators are. "Rolling your own" is really not a good idea considering how much of an art pseudo-random numbers are. Verifying a generator as providing acceptably unpredictable numbers takes a lot of work and academic talent...

score 5 · Answer 3 · answered Apr 01 '12 at 22:16

While not applicable to 'rand()' but a few host functions like "printf" are available when compiling with compute compatibility >= 2.0

e.g:

nvcc.exe -gencode=arch=compute_10,code=\sm_10,compute_10\...
error : calling a host function("printf") from a __device__/__global__ function("myKernel") is not allowed

Compiles and works with sm_20,compute_20

score -1 · Answer 4 · answered May 10 '16 at 10:42

I have to disagree with some of the other answers in the following sense:

OP does not describe a problem: it is not unfortunate that you cannot call __host__ functions from device code - it is entirely impossible for it to be any other way, and that's not a bad thing.

To explain: Think of the host (CPU) code like a CD which you put into a CD player; and on the device code like a, say, SD card which you put into a a miniature music player. OP's question is "how can I shove a disc into my miniature music player"? You can't, and it makes no sense to want to. It might be the same music essentially (code with the same functionality; although usually, host code and device code don't perform quite the same computational task) - but the media are not interchangeable.

How to call a host function in a CUDA kernel?

4 Answers4

Linked

Related