Proper communication from parent to child kernel in CUDA Dynamic Parallelism

Question

I'm experimenting with CUDA Dynamic Parallelism and I have a problem in assuring proper work of parallel Kernel lunches. I've made simple test code:

__global__ void child(int id) {
    if(id%10000 == 0)
        printf("hello\n");
}

__global__ void parent(int nop) {
    unsigned int indX = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int indY = blockIdx.y*blockDim.y + threadIdx.y;
    unsigned int ind = indX* ((int)sqrtf(nop) + 1) + indY;
    if (ind < nop)
    {
        if (ind % 10000 == 0) {
            child << <1, 1 >> > (ind);
            printf("world!");
        }
    }
}

Where nop is a value larger than 1 000 000. I want to pass variable created in parent Kernel to child one but every time I'm getting unspecified failures or BSOD during call. Slowly I'm running out of ideas how to do this properly.

I couldn't find any useful examples for such case.

I suggest you provide a [mcve]. A kernel, by itself, is not a [mcve]. You can do [proper CUDA error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) in device code identically to how it is done in host code. There are a variety of issues you may be running into here, such as hitting a WDDM TDR kernel timeout (looks like you are on windows) and/or running into a [pending launch limit](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#configuration-options), and probably other possibilities also. — Robert Crovella, Jan 22 '19 at 19:30
It sounds like you might be hitting a windows WDDM TDR timeout. See [here](https://docs.nvidia.com/gameworks/content/developertools/desktop/timeout_detection_recovery.htm). — Robert Crovella, Jan 22 '19 at 19:30
Thanks! I've found my problem - it was WDDM TDR timeout due to debugging printf's. Thank You very much! — KamCho, Jan 22 '19 at 20:05

Proper communication from parent to child kernel in CUDA Dynamic Parallelism

0 Answers0