I wrote a simple kernel program that uses printf to log some values after computation. It works perfectly with types like integer other than fp16. It incorrectly prints a zero value. The problem seems to be due to the fact that the print format is not correct. Which format should I use instead without casting it to other data types e.g. float,double?
#include <stdio.h>
#include <cuda_fp16.h>
#define DTYPE half //int
#define PRINT_FMT "output %f\n" //"output %d\n"
__global__ void __launch_bounds__(1024) test_print_kernel(DTYPE *__restrict__ O)
{
// printf("test kernel\n");
if (((int)blockIdx.x == 0) && ((int)threadIdx.x == 0))
{
O[0] = ((DTYPE)(2));
__syncthreads();
printf(PRINT_FMT, O[0]);
}
}
int main(int argc, char **argv)
{
DTYPE *h_O;
cudaStream_t stream;
cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking);
cudaMallocHost(&h_O, 1 * sizeof(DTYPE));
test_print_kernel<<<dim3(1, 1, 1), dim3(1, 1, 1), 0, (cudaStream_t)stream>>>(h_O);
cudaDeviceSynchronize();
}
The warning I got during compilation: test_printf.cu(16): warning #1290-D: a class type that is not trivially copyable passed through ellipsis
test_printf.cu(16): warning #181-D: argument is incompatible with corresponding format string conversion
And it prints to the terminal: output 0.000000