Since CUDA 7.5/8.0 and devices with Pascal GPUs CUDA supports the half precision (FP16) datatype out of the box. Additionally, many of the BLAS calls inside CUBLAS support the half precision types, e.g. the GEMM operation available as cublasHgemm
. My problem is that the host does not support half precision types. Is the an already implemented solution like cublasSetMatrix
which does the conversion during the upload to the device? Or is it necessary to create a tricky implementation by composition a float upload with a CUDA kernel doing the truncation to float?
Asked
Active
Viewed 1,698 times
0

M.K. aka Grisu
- 2,338
- 5
- 17
- 32
1 Answers
2
There is no function currently provided by the CUDA toolkit which converts float
quantities to half
quantities in the process of copying data from host to device.
It is possible to convert from float
to half
either in host code or device code. There would be advantages and disadvantages of doing it in either place.
Furthermore, there is a cublas<t>gemmEx
function available that may be of interest, which can have differing datatypes for input and output (and computation).
Some other half
resources that may be of interest:
- half conversion library by Christian Rau mentioned by @talonmies below
- CUDA half intrinsics.
- previous SO question, with links to some other resources

Community
- 1
- 1

Robert Crovella
- 143,785
- 11
- 213
- 257
-
Christian Rau has developed a very high quality host IEEE 754 half precision library today might be useful here – talonmies Mar 30 '17 at 16:17