I'm experimenting with CUDA Dynamic Parallelism and I have a problem in assuring proper work of parallel Kernel lunches. I've made simple test code:
__global__ void child(int id) {
if(id%10000 == 0)
printf("hello\n");
}
__global__ void parent(int nop) {
unsigned int indX = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int indY = blockIdx.y*blockDim.y + threadIdx.y;
unsigned int ind = indX* ((int)sqrtf(nop) + 1) + indY;
if (ind < nop)
{
if (ind % 10000 == 0) {
child << <1, 1 >> > (ind);
printf("world!");
}
}
}
Where nop is a value larger than 1 000 000. I want to pass variable created in parent Kernel to child one but every time I'm getting unspecified failures or BSOD during call. Slowly I'm running out of ideas how to do this properly.
I couldn't find any useful examples for such case.