I have a relatively simple CUDA kernel and I immediately call the kernel in the main method of my program in the following way:
__global__ void block() {
for (int i = 0; i < 20; i++) {
printf("a");
}
}
int main(int argc, char** argv) {
block << <1, 1 >> > ();
cudaError_t cudaerr = cudaDeviceSynchronize();
printf("Kernel executed!\n");
if (cudaerr != cudaSuccess)
printf("kernel launch failed with error \"%s\".\n",
cudaGetErrorString(cudaerr));
}
This program is compiled and launched using Visual Studio 2015, and the project being executed has been generated with CMAKE using the following CMakeLists.txt file:
project (Comparison)
cmake_minimum_required (VERSION 2.6)
find_package(CUDA REQUIRED)
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-arch=compute_30 -code=sm_30 -g -G
)
cuda_add_executable(Comparison kernel.cu)
I would expect the output of this program to print 20 A's to the console and then end with printing kernel executed. However, the A's are never printed to the console and the line Kernel executed shows up immediately. Even if I replace the for loop by a while(true)
loop.
Even when running the code with the Nsight debugger attached and a breakpoint in the for loop of the kernel nothing happens. Leading me to believe that the kernel is never actually launched. Does anyone know how to make this kernel behave as expected?