CUDA, cuBLAS and half precision data types

Question

Since CUDA 7.5/8.0 and devices with Pascal GPUs CUDA supports the half precision (FP16) datatype out of the box. Additionally, many of the BLAS calls inside CUBLAS support the half precision types, e.g. the GEMM operation available as cublasHgemm. My problem is that the host does not support half precision types. Is the an already implemented solution like cublasSetMatrix which does the conversion during the upload to the device? Or is it necessary to create a tricky implementation by composition a float upload with a CUDA kernel doing the truncation to float?

score 2 · Accepted Answer · edited May 23 '17 at 11:46

There is no function currently provided by the CUDA toolkit which converts float quantities to half quantities in the process of copying data from host to device.

It is possible to convert from float to half either in host code or device code. There would be advantages and disadvantages of doing it in either place.

Furthermore, there is a cublas<t>gemmEx function available that may be of interest, which can have differing datatypes for input and output (and computation).

Some other half resources that may be of interest:

half conversion library by Christian Rau mentioned by @talonmies below
CUDA half intrinsics.
previous SO question, with links to some other resources

Christian Rau has developed a very high quality host IEEE 754 half precision library today might be useful here — talonmies, Mar 30 '17 at 16:17

CUDA, cuBLAS and half precision data types

1 Answers1