I am working with GeForce 210, compute capability 1.2 and CUDA 6.5.
I wish to print float values from my CUDA kernel, I have included "cuPrintf.cu" and "cuPrintf.cuh" in my project directory as well as included them in my code. It compiles fine and runs without errors, but prints nothing. This is how I compile my code :
$ nvcc -arch=compute_12 test.cu
I read similar question and then surrounded my kernel with cudaPrintfInit() and cudaPrintfDisplay().
if(cudaPrintfInit() != cudaSuccess)
printf("cudaPrintfInit failed\n");
test_kernel<<<grid, block>>>(val);
if(cudaPrintfDisplay(stdout, true) != cudaSuccess)
printf("cudaPrintfDisplay failed\n");
cudaPrintfEnd();
My kernel looks like this:
__global__ void test_kernel (float val){
i = blockIdx.x*BLOCK_X + threadIdx.x;
j = blockIdx.y*BLOCK_Y + threadIdx.y;
if( j == 20 )
cuPrintf("%f is value, %d is j", val, j);
}
On compiling and running, the output is :
cudaPrintfInit failed
cudaPrintfDisplay failed
I guess there could be a problem with the way I am compiling, or cuPrintf does not allow float to be printed? According to the attached link of the similar question, the problem was with the threads per block exceeding a max value, but my block size is 16 x 16 (so that should not be the problem). cudaPrintfInit and cudaPrintfDisplay show failed!
I have also run the CUDA sample code "simplePrintf" which comes with the CUDA installation. That works perfectly. Help!