I had an issue with a much larger kernel, but it seems to distil down to the following code, from which the kernel never returns. Can someone please explain why there is an infinite loop?
__global__ void infinite_while_kernel(void)
{
int index = 0;
while (index >= threadIdx.x) {
index--;
}
return;
}
int main(void) {
infinite_while_kernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
In addition, the below kernel also gets stuck:
__global__ void not_infinite_while_kernel(void)
{
int index = 0;
while (index >= (unsigned int) 0u*threadIdx.x) {
index--;
}
return;
}
Replacing threadIdx.x
with 0
in the original kernel returns, as expected. I'm using the v5.5 toolkit, and compiling with the -arch=sm_20 -O0
flags. Running on a Tesla M2090. I do not currently have access to any other hardware, nor toolkit versions (it's not my system).