1

I am doing computations in Cuda using floats. Because we do not have enough memory on the GPU, we store the raw data as uint16_t and int16_t on the GPU. Thus, before I use this data I have to convert it to floats. The number of ints is not that large (approximately 12k of uint16_t and the same number of int16_t). Profiling showed that converting the numbers takes a considerable amount of time (approx. 5-10%). The rest of the calculation cannot be optimized more. Thus my 3+1 questions are:

  • What is the fastest way to convert ints to floats.
  • Is there a substantial difference when converting int16_t or uint16_t.
  • Is there a substantial difference when converting larger int types, e.g. int32 or int64.
  • Why are all questions on SO about converting floats to ints. Is this something one usually does not do?
tommsch
  • 582
  • 4
  • 19
  • What exactly do you mean with "convert"? `int16_t x =...; float f = float(x);`? – Lukas-T Feb 07 '21 at 10:48
  • @churill Yes, exactly. – tommsch Feb 07 '21 at 10:50
  • 1
    Is this conversion happening in CUDA or at the CUDA/nonCUDA edge? (I assume the first). What does your existing conversion look like? What does the remaining calculation look like, at least in pseudo code? Are you using all the floats you produce? How separable are these values? 16 bit int to 32 bit float should be bit shifting; signed a tiny bit more complex. – Yakk - Adam Nevraumont Feb 07 '21 at 11:02
  • [This answer](https://stackoverflow.com/a/20308114/11527076) relates to your problem, and a 16-bit integer will largely fit into the fractional part, but I'm afraid the `while` loop to determine the required shifting will slow down even more... – prog-fh Feb 07 '21 at 11:08
  • 2
    The GPU compiler will emit a hardware instruction for a simple conversion. The documentation notes the throughput is either 16 or 32 instructions per clock cycle per mutliprocessor. On that basis I guess your microbenchmarking is wrong and what you attributing to conversion cost is something else – talonmies Feb 07 '21 at 11:11
  • @Yakk At CUDA. Currently I use `static_cast`s. Pseudocode: *cast int to float; Loop over all floats to compute a main value; Loop again over all floats and use the main value to compute the result;* I am quite sure this algorithm cannot be optimized. – tommsch Feb 08 '21 at 09:35
  • Thanks everybody. You've been a great help. – tommsch Feb 08 '21 at 09:39

1 Answers1

5
  • What is the fastest way to convert ints to floats.

Simple assignment. There are hardware type conversion instructions which the CUDA compiler will emit automatically without you doing anything. Hardware conversion includes the correct IEEE rounding modes.

  • Is there a substantial difference when converting int16_t or uint16_t.

No.

  • Is there a substantial difference when converting larger int types, e.g. int32 or int64.

No.Yes. The instruction throughput for type conversion instructions is documented. 32 bit and 16 bit integer to float conversion instructions have the same throughput. 64 bit conversion instructions are considerably slower than 16 and 32 bit conversion instructions on most architectures.

  • Why are all questions on SO about converting floats to ints. Is this something one usually does not do?

Because many people don't get the difference between float and int types, and why they loose precision, when they convert their float or double type to an int type.
That's nothing you have to worry about in your situation.

talonmies
  • 70,661
  • 34
  • 192
  • 269
πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190