Questions tagged [dynamic-parallelism]

dynamic parallelism refers to a capability in CUDA for device kernel launches to be performed from within a device kernel

This tag should be used for questions pertaining to CUDA dynamic parallelism. This refers to the capability for CUDA devices of compute capability 3.5 or higher to be able to launch a device kernel from within a device kernel. In addition, using this functionality requires the specification of certain CUDA compilation switches, such as the switch to enable relocatable device code, and the switch to link in the device runtime library.

50 questions

votes

1 answer

compilation .cu files with Dynamic Parallelism(CUDA)

I switched to a new GPU GeForce GTX 980 with cc 5.2, so it must support dynamic parallelism. However, I was not able to compile even a simple code (from programming guide). I will not provide it here (not necessary, just there is a global kernel…

cuda dynamic-parallelism

asked Jan 09 '15 at 00:08

Mikhail Genkin

3,247
4
27
47

votes

0 answers

AleaGPU Dynamic Parallelism in F#? How?

This might be a simple question, but I have not been able to find any references to this topic: How do I launch a kernel from within another kernel?. The only relevant example I came across is the post:(Does Alea GPU support dynamic parallelism?),…

f# aleagpu dynamic-parallelism

asked Jul 02 '18 at 13:34

Alex Gheith

votes

2 answers

CUDA Dynamic Parallelism, bad performance

We are having performance issues when using the CUDA Dynamic Parallelism. At this moment, CDP is performing at least 3X slower than a traditional approach. We made the simplest reproducible code to show this issue, which is to increment the value of…

c++ cuda dynamic-parallelism cuda-streams

asked Jul 19 '17 at 21:10

Cristobal Navarro

votes

1 answer

CUDA device runtime api cudaMemsetAsync doesn't work

I am trying to call cudaMemsetAsync from kernel (so called "dynamic parallelism"). But no matter what value I use, it always set memory to 0. Here is my test code: #include "cuda_runtime.h" #include "device_launch_parameters.h" #include…

cuda dynamic-parallelism

asked May 17 '17 at 14:58

Xiang Zhang

2,831
20
40

votes

1 answer

Dynamic Parallelism on GTX 980 ti: Unknown Error

I am attempting dynamic parallelism on a GTX 980 ti card. All attempts at running code return "unknown error". Simple code is shown below with compilation options. I can execute kernels at depth=0 with no issues. The first time a child is called,…

cuda nvidia dynamic-parallelism

asked Jul 30 '17 at 17:17

AshleyG

votes

1 answer

What factors effect the overhead of dynamic parallelism kernel launches?

When you launch a secondary kernel from within a primary one on a GPU, there's some overhead. What are the factors contributing or affecting the amount of this overhead? e.g. size of the kernel code, occupancy of the SM where the kernel is being…

cuda dynamic-parallelism

asked Mar 20 '17 at 15:01

einpoklum

118,144
57
340
684

votes

1 answer

How to perform relational join on two data containers on GPU (preferably CUDA)?

What I'm trying to do: On the GPU, I'm trying to mimic the conventions used by SQL in relational algebra to perform joins on tables (e.g. Inner Join, Outer Join, Cross Join). In the code below, I'm wanting to perform an Inner Join. Imagine two…

c++ cuda gpgpu thrust dynamic-parallelism

asked Jun 14 '16 at 13:25

aiwyn

votes

1 answer

CUDA dynamic parallelism with Driver API

I'm trying to compile and link a dynamic kernel and use it with the CUDA driver API on a GK110. I compile the .cu source file in Visual Studio with the relocatable device code flag and compute_35, sm_35 into a ptx file and then the CUDA linker adds…

cuda dynamic-parallelism

asked Jan 07 '15 at 22:35

FHoenig

votes

1 answer

"unknown error" on first cudaMalloc if CUBLAS is present in kernel

I have the following minimal .cu file #include #include #include __global__ void test() { cublasHandle_t handle = nullptr; cublasCreate(&handle); } int main(int, char**) { void * data =…

cuda cublas dynamic-parallelism

asked Sep 19 '16 at 08:15

Joe

6,497
4
29
55

votes

1 answer

CUDA - How to make thread in kernel wait for it's children

I'm trying to implement a really simple merge sort using CUDA recursive (for cm > 35) technology, but I can not find a way to tell the parent thread to launch it's children concurrently and then wait for it's children computation, since…

sorting parallel-processing cuda dynamic-parallelism

asked Dec 01 '14 at 21:43

Eugênio Fonseca

votes

0 answers

Does nvcc support tail call optimization in dynamic parallelism?

Under the CUDA Programming Guide section C.4.3.1.2. "Nesting and Synchronization Depth", it is mentioned: "An optimization is permitted where the system detects that it need not reserve space for the parent's state in cases where the parent kernel…

recursion cuda tail-recursion dynamic-parallelism

asked Sep 04 '14 at 16:54

peteraldaron

votes

1 answer

Dynamic parallelism - launching many small kernels is very slow

I am trying to use dynamic parallelism to improve an algorithm I have in CUDA. In my original CUDA solution, every thread computes a number that is common for each block. What I want to do is to first launch a coarse (or low resolution) kernel,…

cuda dynamic-parallelism

asked Jan 07 '14 at 11:07

labotsirc

votes

3 answers

Kepler CUDA dynamic parallelism and thread divergence

There is very little information on dynamic parallelism of Kepler, from the description of this new technology, does it mean the issue of thread control flow divergence in the same warp is solved? It allows recursion and lunching kernel from device…

cuda kepler dynamic-parallelism

asked Jul 09 '12 at 06:46

HooYao

vote

1 answer

Why can't I link to my CUDA static library that uses Dynamic Parallelism and Separable Compilation?

I'm trying to create the most basic CUDA application to demonstrate Dynamic Parallelism, Separate Compilation and Linking, a CUDA kernel in a static library, and I'm trying to use CMake to generate a Visual Studio solution. I'm using CMake 3.21.3,…

c++ cmake cuda dynamic-parallelism

asked Oct 22 '21 at 12:43

Justin

1,881
4
20
40

vote

1 answer

CL_OUT_OF_RESOURCES error is returned by clEnqueueNDRangeKernel() with dynamic parallelism

Kernel codes that produce the error: __kernel void testDynamic(__global int *data) { int id=get_global_id(0); atomic_add(&data[1],2); } __kernel void test(__global int * data) { int id=get_global_id(0); atomic_add(&data[0],2); …

opencl dynamic-parallelism

asked Jul 01 '17 at 21:36

huseyin tugrul buyukisik

11,469
4
45
97

2 3 4 Next