So I am using GTX 1050 with a compute capability of 6.1 with CUDA 11.0. I need to use grid synchronization in my program so cudaLaunchCooperativeKernel()
is needed. I have checked my device query so the GPU does have support for cooperative groups. I am unable to execute the following function
extern "C" __global__ void test(int x) {
if (x) {
printf("%d", x);
if (threadIdx.x == 0)
test<<<1, 1>>>(--x);
}
}
After calling,
cudaLaunchCooperativeKernel((void *)test, 1, 1, (void **) (&x));
getting an error 'operation not permitted' (code is 800). Now, this is returned when the device has no support of cooperative groups (Not in this case). So, what could cause this problem?