How to make a kernel function which callable from both the host and device?

Question

The following trial presents my intention, which failed to compile:

__host__ __device__ void f(){}

int main()
{
    f<<<1,1>>>();
}

The compiler complaints:

a.cu(5): error: a __device__ function call cannot be configured

1 error detected in the compilation of "/tmp/tmpxft_00001537_00000000-6_a.cpp1.ii".

Hope my statement is clear, and thanks for advices.

I tried both "__device__ __host__" and "__host__ __device__" combination, and both failed — Hailiang Zhang, Jun 11 '13 at 21:29
The above code was based on a tutorial "http://www.uni-graz.at/~liebma/CUDA/NVISION08-Getting_Started_with_CUDA.pdf" — Hailiang Zhang, Jun 11 '13 at 21:34

Eugene · Accepted Answer · 2013-06-12T00:58:46.027

12

You need to create a CUDA kernel entry point, e.g. __global__ function. Something like:

#include <stdio.h>

__host__ __device__ void f() {
#ifdef __CUDA_ARCH__
    printf ("Device Thread %d\n", threadIdx.x);
#else
    printf ("Host code!\n");
#endif
}

__global__ void kernel() {
   f();
}

int main() {
   kernel<<<1,1>>>();
   if (cudaDeviceSynchronize() != cudaSuccess) {
       fprintf (stderr, "Cuda call failed\n");
   }
   f();
   return 0;
}

edited Jun 12 '13 at 00:58

answered Jun 12 '13 at 00:13

Eugene

9,242
2
30
29

__CUDA_ARCH__ will be defined in both calls. Pre-compiler code makes no sense in this context... – Henrique Mendonça Dec 04 '17 at 08:19
@HenriqueMendonça: I [believe you are mistaken](https://stackoverflow.com/a/16073481/1593077). – einpoklum Dec 06 '21 at 14:19

score -2 · Answer 2 · answered Jun 12 '13 at 02:14

-2

The tutorial you are looking at is so old, 2008? It might not be compatible with the version of CUDA you are using.

You can use __global__ and that means __host__ __device__, this works:

__global__ void f()
{
    const int tid = threadIdx.x + blockIdx.x * blockDim.x;
}

int main()
{
    f<<<1,1>>>();
}

answered Jun 12 '13 at 02:14

Adam

3,872
6
36
66

`__global__` specifies a kernel entry point, i.e. a function that will auto-parallelize into GPU code when called with launch parameters. `__host__` and `__device__` are not used to decorate kernel functions. The only sense in which you could say `__global__` means `__host__ __device__` with any sense is in the case of [cuda dynamic parallelism](http://docs.nvidia.com/cuda/cuda-dynamic-parallelism/index.html), which is only available on cc 3.5 devices. Even in that case, I think it's sloppy to say `__global__` means `__host__ __device__` – Robert Crovella Jun 12 '13 at 03:03
@RobertCrovella I agree, I only meant they are equivalent in his context, as my code cannot be called from the host anyway as it has kernel variables. – Adam Jun 12 '13 at 03:11

How to make a kernel function which callable from both the host and device?

2 Answers2

Linked

Related