With CUDA, is there a limit to the amount of computation or number of blocks allowed?

Question

I am exploring CUDA 8.0 with Visual Studio 2015 (running on a GeForce GTX 1060).

I tried setting 2000 blocks to run 1024 threads each (values that are supported) but I get an error code 4 after launching the kernel. The blocks are not doing anything exotic, in fact I'm not even using shared memory. What am I doing wrong?

My code is as follows:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>
#include <stdlib.h>

__global__
void addKernel()
{
    unsigned int i, ans = 0;
    for (i = 0; i < 100000; i++)
    {
        ans += i;
    }
}

int main()
{
    addKernel << <2000, 1024 >> >();

    cudaError_t cudaStatus = cudaDeviceSynchronize();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
    }

    cudaDeviceReset();
    getchar();
    return 0;
}

Output:

cudaDeviceSynchronize returned error code 4 after launching addKernel!

When I cut the number of blocks in half, the error goes away. Interestingly, I can eliminate the error by reducing the 100,000 iterations of the loop in the kernel to 1,000 as well.

@RobertCrovella That was it, thanks! I disabled TDR in Nsight options and rebooted. — Josh, Sep 20 '17 at 04:06

With CUDA, is there a limit to the amount of computation or number of blocks allowed?

0 Answers0