0

I wrote a simple kernel program that uses printf to log some values after computation. It works perfectly with types like integer other than fp16. It incorrectly prints a zero value. The problem seems to be due to the fact that the print format is not correct. Which format should I use instead without casting it to other data types e.g. float,double?


#include <stdio.h>
#include <cuda_fp16.h>

#define DTYPE half //int
#define PRINT_FMT "output %f\n" //"output %d\n"

__global__ void __launch_bounds__(1024) test_print_kernel(DTYPE *__restrict__ O)
{
    // printf("test kernel\n");
    if (((int)blockIdx.x == 0) && ((int)threadIdx.x == 0))
    {
        O[0] = ((DTYPE)(2));
        __syncthreads();

        printf(PRINT_FMT, O[0]);

    }
}

int main(int argc, char **argv)
{
    DTYPE *h_O;
    cudaStream_t stream;
    cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking);
    cudaMallocHost(&h_O, 1 * sizeof(DTYPE));
    test_print_kernel<<<dim3(1, 1, 1), dim3(1, 1, 1), 0, (cudaStream_t)stream>>>(h_O);
    cudaDeviceSynchronize();
}

The warning I got during compilation: test_printf.cu(16): warning #1290-D: a class type that is not trivially copyable passed through ellipsis

test_printf.cu(16): warning #181-D: argument is incompatible with corresponding format string conversion

And it prints to the terminal: output 0.000000

  • 2
    There [isn't](https://stackoverflow.com/a/70673073/1695960) a `printf` format specifically for fp16. [convert](https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____HALF__MISC.html#group__CUDA__MATH____HALF__MISC_1g0b79d92cb1fd7012b9c4416e9f4a03ba) it to `float` before using with `printf`. Since `printf` in kernel is not part of any high performance code path (usually used for debugging) this shouldn't present an onerous problem. – Robert Crovella Jun 14 '23 at 17:56
  • @RobertCrovella Got it. Thanks for answering the question – Chun Ting Li Jun 15 '23 at 17:16

1 Answers1

1

There is no output format specifier for half precision floating point values in either the C++ standard definition of printf, or in the CUDA implementation.

Your only real choice is to convert the half value to a float and use the standard float format specifier. The CUDA Math API has an instrinic function for this so something like:

printf(“output %f\n”, __half2float(O[0]));

should work at a minimum. As pointed out in comments, there is a performance penalty for the conversion, but this is printf, which has a huge performance penalty anyway.

Note that this will make the kernel specific to the half case. If you genuinely need a generic kernel for different types with output, there are variadic template solutions which you could use to generalise this to different types, for example as discussed here.

talonmies
  • 70,661
  • 34
  • 192
  • 269