I am doing computations in Cuda using float
s. Because we do not have enough memory on the GPU, we store the raw data as uint16_t
and int16_t
on the GPU. Thus, before I use this data I have to convert it to float
s.
The number of int
s is not that large (approximately 12k of uint16_t
and the same number of int16_t
). Profiling showed that converting the numbers takes a considerable amount of time (approx. 5-10%). The rest of the calculation cannot be optimized more.
Thus my 3+1 questions are:
- What is the fastest way to convert
int
s tofloat
s. - Is there a substantial difference when converting
int16_t
oruint16_t
. - Is there a substantial difference when converting larger
int
types, e.g.int32
orint64
. - Why are all questions on SO about converting
float
s toint
s. Is this something one usually does not do?