149

Can anyone describe the differences between __global__ and __device__ ?

When should I use __device__, and when to use __global__?.

albusshin
  • 3,930
  • 3
  • 29
  • 57
Mehdi Saman Booy
  • 2,760
  • 5
  • 26
  • 32

9 Answers9

179

Global functions are also called "kernels". It's the functions that you may call from the host side using CUDA kernel call semantics (<<<...>>>).

Device functions can only be called from other device or global functions. __device__ functions cannot be called from host code.

Angie Quijano
  • 4,167
  • 3
  • 25
  • 30
Eugene
  • 9,242
  • 2
  • 30
  • 29
  • 20
    Just as an addendum, `__global__` functions can also be called from the device using CUDA kernel semantics (<<<...>>>) if you are using dynamic parallelism - that requires CUDA 5.0 and compute capability 3.5 or higher. – Tom Sep 11 '12 at 17:44
51
  1. __global__ - Runs on the GPU, called from the CPU or the GPU*. Executed with <<<dim3>>> arguments.
  2. __device__ - Runs on the GPU, called from the GPU. Can be used with variabiles too.
  3. __host__ - Runs on the CPU, called from the CPU.

*) __global__ functions can be called from other __global__ functions starting
compute capability 3.5.

Johan
  • 74,508
  • 24
  • 191
  • 319
  • 7
    This answer is a bit too late - it was correct at the time the question was asked, but it is not correct anymore since the invention of [dynamic parallelism](https://devblogs.nvidia.com/parallelforall/introduction-cuda-dynamic-parallelism/). – tera Sep 12 '16 at 11:47
49

Differences between __device__ and __global__ functions are:

__device__ functions can be called only from the device, and it is executed only in the device.

__global__ functions can be called from the host, and it is executed in the device.

Therefore, you call __device__ functions from kernels functions, and you don't have to set the kernel settings. You can also "overload" a function, e.g : you can declare void foo(void) and __device__ foo (void), then one is executed on the host and can only be called from a host function. The other is executed on the device and can only be called from a device or kernel function.

You can also visit the following link: http://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialDeviceFunctions, it was useful for me.

Angie Quijano
  • 4,167
  • 3
  • 25
  • 30
FacundoGFlores
  • 7,858
  • 12
  • 64
  • 94
19

I will explain it with an example:

main()
{
    // Your main function. Executed by CPU
}

__global__ void calledFromCpuForGPU(...)
{
  //This function is called by CPU and suppose to be executed on GPU
}

__device__ void calledFromGPUforGPU(...)
{
  // This function is called by GPU and suppose to be executed on GPU
}

i.e. when we want a host(CPU) function to call a device(GPU) function, then 'global' is used. Read this: "https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialGlobalFunctions"

And when we want a device(GPU) function (rather kernel) to call another kernel function we use 'device'. Read this "https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialDeviceFunctions"

This should be enough to understand the difference.

sandeep.ganage
  • 1,409
  • 2
  • 21
  • 47
16

__global__ is for cuda kernels, functions that are callable from the host directly. __device__ functions can be called from __global__ and __device__ functions but not from host.

perreal
  • 94,503
  • 21
  • 155
  • 181
8

__global__ function is the definition of kernel. Whenever it is called from CPU, that kernel is launched on the GPU.

However each thread executing that kernel, might require to execute some code again and again, for example swapping of two integers. Thus, here we can write a helper function, just like we do in a C program. And for threads executing on GPU, a helper function should be declared as __device__.

Thus, a device function is called from threads of a kernel - one instance for one thread . While, a global function is called from CPU thread.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Lorin Ahmed
  • 167
  • 1
  • 4
7

I am recording some unfounded speculations here for the time being (I will substantiate these later when I come across some authoritative source)...

  1. __device__ functions can have a return type other than void but __global__ functions must always return void.

  2. __global__ functions can be called from within other kernels running on the GPU to launch additional GPU threads (as part of CUDA dynamic parallelism model (aka CNP)) while __device__ functions run on the same thread as the calling kernel.

Sandeep Datta
  • 28,607
  • 15
  • 70
  • 90
7

__global__ is a CUDA C keyword (declaration specifier) which says that the function,

  1. Executes on device (GPU)
  2. Calls from host (CPU) code.

global functions (kernels) launched by the host code using <<< no_of_blocks , no_of threads_per_block>>>. Each thread executes the kernel by its unique thread id.

However, __device__ functions cannot be called from host code.if you need to do it use both __host__ __device__.

BY0B
  • 99
  • 1
  • 9
2

Global Function can only be called from the host and they don't have a return type while Device Function can only be called from kernel function of other Device function hence dosen't require kernel setting