cudaPrintfInit and cudaPrintfDisplay failed

Question

I am working with GeForce 210, compute capability 1.2 and CUDA 6.5.

I wish to print float values from my CUDA kernel, I have included "cuPrintf.cu" and "cuPrintf.cuh" in my project directory as well as included them in my code. It compiles fine and runs without errors, but prints nothing. This is how I compile my code :

$ nvcc -arch=compute_12 test.cu

I read similar question and then surrounded my kernel with cudaPrintfInit() and cudaPrintfDisplay().

if(cudaPrintfInit() != cudaSuccess)
    printf("cudaPrintfInit failed\n");

test_kernel<<<grid, block>>>(val);

if(cudaPrintfDisplay(stdout, true) != cudaSuccess)
    printf("cudaPrintfDisplay failed\n");
cudaPrintfEnd();

My kernel looks like this:

__global__ void test_kernel (float val){
    i = blockIdx.x*BLOCK_X + threadIdx.x;
    j = blockIdx.y*BLOCK_Y + threadIdx.y;
    if( j == 20 )
        cuPrintf("%f is value, %d is j", val, j);
}

On compiling and running, the output is :

cudaPrintfInit failed
cudaPrintfDisplay failed

I guess there could be a problem with the way I am compiling, or cuPrintf does not allow float to be printed? According to the attached link of the similar question, the problem was with the threads per block exceeding a max value, but my block size is 16 x 16 (so that should not be the problem). cudaPrintfInit and cudaPrintfDisplay show failed!

I have also run the CUDA sample code "simplePrintf" which comes with the CUDA installation. That works perfectly. Help!

The number of blocks is missing in your question, that is the `grid` value. Just guessing is one, so j will be always in the range `[0..15]`and therefore you will never print the value. — pQB, Mar 13 '15 at 12:52
@pQB that doesn't explain why `cudaPrintfInit failed` message is printed. For questions of this type, SO [expects](http://stackoverflow.com/help/on-topic) that you provide a [MCVE](http://stackoverflow.com/help/mcve). Provide a short complete code that shows what you are doing. Furthermore, make sure you are doing [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) — Robert Crovella, Mar 13 '15 at 14:19
No the total grid was 256 x 256, so 16 x 16 blocks to cover up the whole grid. Anyway, I could solve this problem, I have added another answer below. — roynalnaruto, Mar 19 '15 at 20:34

DanRo · Answer 1 · 2015-03-18T01:31:57.527

0

Formatted output is only supported by devices of compute capability 2.x and higher.

int printf(const char *format[, arg, ...]);

prints formatted output from a kernel to a host-side output stream.

Reference: CUDA C Programming Guide 2015, pag 119.

see this link: https://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialHelloWorld

edited Mar 18 '15 at 01:31

answered Mar 18 '15 at 01:26

DanRo

11
3

score 0 · Accepted Answer · answered Mar 19 '15 at 20:37

I could solve the problem by running with 'cuda-memcheck'. cudaPrintf was not working because 'nan' values were being generated in the kernel. The denominator in some computations was becoming zero, and when I avoided those cases, cudaPrintfInit and cudaPrintfDisplay started working.

cudaPrintfInit and cudaPrintfDisplay failed

2 Answers2