2

Is it possible we invoke a __global__ function within another __global__ function which is also a kernel(__global__) in CUDA?

for example:

    __global__ void func()
    {
            .
            .
    }

    __global__ void foo()
    {
            .
            .
            func //this is a "func" function that has defination on the kernel
    }

    int main(void)
    {
            .
            .
            func <<<1, 1 >>>()
            foo <<<1, 1 >>>()
    }

And could it be use any function from thrust library in a __global__ function ?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Maria
  • 25
  • 7
  • 1
    Yes, you can call a `__global__` function from within another `__global__` function using [CUDA Dynamic Parallelism](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-dynamic-parallelism). Thrust functions can be called from `__global__` functions as well, and there are a variety of options, see [here](http://stackoverflow.com/questions/5510715/thrust-inside-user-written-kernels) and [here](http://stackoverflow.com/questions/28150098/how-to-use-thrust-to-sort-the-rows-of-a-matrix/28254765#28254765) and [here](http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/) – Robert Crovella Oct 22 '15 at 22:18
  • @RobertCrovella could you please clarify if the answer below is right about thrust not supporting dynamic parallelism in contrast to [this blog post](http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/)? – m.s. Oct 24 '15 at 06:59
  • 2
    Thrust should use dynamic parallelism when an execution policy of `thrust::device` is specified in a nested algorithm case on a device of cc3.5 or higher, if appropriate compile switches are used. I discussed this and gave a worked example at the second link in my previous comment ([here](http://stackoverflow.com/questions/28150098)). The use of dynamic parallelism there can be confirmed with `nvprof`. (I have just done this again now.) If this behavior is not desired, the use of dynamic parallelism can be explicitly avoided in that case by specifying `thrust::seq` instead of `thrust::device`. – Robert Crovella Oct 24 '15 at 09:20

1 Answers1

3

Compute capability 3.5 and newer hardware support what is called Dynamic Parallelism, which gives them the ability to have kernels launched by running kernels on the GPU without requiring any host API calls.

Older hardware supports functions which can be called from kernels (these are denoted as __device__ instead of __global__) and are executed at thread scope only, so no new kernel is launched.

Since Thrust 1.8 was release, a serial execution policy has been introduced, which allows thrust algorithms to be call by threads within an existing running kernel, much like __device__functions. Thrust should also support dynamic parallelism via the thrust::device execution policy on supported hardware.

Community
  • 1
  • 1
The Vivandiere
  • 3,059
  • 3
  • 28
  • 50
  • thank you, and what about from thrust library in a global function ? – Maria Oct 22 '15 at 21:56
  • @FirstJens: The alternative to deleting it is to edit it to (a) make it answer the question and (b) make what it contains correct. – talonmies Oct 23 '15 at 05:22
  • @FirstJens: I took the liberty of doing it myself. Feel free to roll back the edit if you don't like it or think you can do better yourself. – talonmies Oct 23 '15 at 05:39
  • @talonmies thank you very much. your answer was useful! – Maria Oct 23 '15 at 06:37
  • @talonmies you edited to "Thrust presently does not support dynamic parallelism"; but [this blog post](http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/) states: "Alternatively, you can use thrust::device to have the algorithm launch a child kernel (using Dynamic Parallelism)"; which one is correct? – m.s. Oct 23 '15 at 09:59