How to convert an unsigned integer to floating-point in x86 (32-bit) assembly?

Question

I need to convert both 32-bit and 64-bit unsigned integers into floating-point values in xmm registers. There are x86 instructions to convert signed integers into single and double precision floating-point values, but nothing for unsigned integers.

Bonus: How to convert float-point values in xmm registers to 32-bit and 64-bit unsigned integers?

This easy for 32-bit unsigned integers. But 64-bit signed and unsigned is hard. — Mysticial, Jul 10 '12 at 04:41
Likewise for `float->int` conversions, there are very fast methods if you are willing to cut corners with `NaN`, `INF`, overflow, etc... — Mysticial, Jul 10 '12 at 04:56
I suppose the only way is to decompose it into lower-32 and upper-32 bits. For the `float->int` conversions, you're gonna need to branch to catch all the corner cases. (or hack around with conditional moves) — Mysticial, Jul 10 '12 at 05:24

score 4 · Accepted Answer · answered Jul 30 '12 at 16:26

Shamelessly using Janus answer as a template (after all I really like C++):

Generate with gcc -march=native -O3 on a i7, so this is with up to and including -mavx. uint2float and vice versa are as expected, the long conversions just have a special case for numbers greater than 2⁶³-1.

0000000000000000 <ulong2double>:
   0:   48 85 ff                test   %rdi,%rdi
   3:   78 0b                   js     10 <ulong2double+0x10>
   5:   c4 e1 fb 2a c7          vcvtsi2sd %rdi,%xmm0,%xmm0
   a:   c3                      retq   
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  10:   48 89 f8                mov    %rdi,%rax
  13:   83 e7 01                and    $0x1,%edi
  16:   48 d1 e8                shr    %rax
  19:   48 09 f8                or     %rdi,%rax
  1c:   c4 e1 fb 2a c0          vcvtsi2sd %rax,%xmm0,%xmm0
  21:   c5 fb 58 c0             vaddsd %xmm0,%xmm0,%xmm0
  25:   c3                      retq   

0000000000000030 <ulong2float>:
  30:   48 85 ff                test   %rdi,%rdi
  33:   78 0b                   js     40 <ulong2float+0x10>
  35:   c4 e1 fa 2a c7          vcvtsi2ss %rdi,%xmm0,%xmm0
  3a:   c3                      retq   
  3b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  40:   48 89 f8                mov    %rdi,%rax
  43:   83 e7 01                and    $0x1,%edi
  46:   48 d1 e8                shr    %rax
  49:   48 09 f8                or     %rdi,%rax
  4c:   c4 e1 fa 2a c0          vcvtsi2ss %rax,%xmm0,%xmm0
  51:   c5 fa 58 c0             vaddss %xmm0,%xmm0,%xmm0
  55:   c3                      retq   

0000000000000060 <uint2double>:
  60:   89 ff                   mov    %edi,%edi
  62:   c4 e1 fb 2a c7          vcvtsi2sd %rdi,%xmm0,%xmm0
  67:   c3                      retq   

0000000000000070 <uint2float>:
  70:   89 ff                   mov    %edi,%edi
  72:   c4 e1 fa 2a c7          vcvtsi2ss %rdi,%xmm0,%xmm0
  77:   c3                      retq

You only need `-march=core2` and `-m64` (maybe implicit, as in your case) to get this result. All the AVX instructions here are available in legacy SSE2 variants. For example the the last `vcvtsi2ss %rdi,%xmm0,%xmm0` could be `cvtsi2ss %rdi,%xmm0`. Interestingly, that also works in SSE1, but the `cvtsi2sd` in `uint2double` requires SSE2. — Janus Troelsen, Jul 30 '12 at 17:29
@tgiphil: Add `-m32` to the compile options to make 32-bit code, if your GCC defaults to `-m64`. — Peter Cordes, Jul 28 '23 at 18:21

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

Here's what GCC generates. I wrapped them in functions, but you can easily remove the stack handling. Not all of them use SSE to do the actual work (the ulonglong conversions don't), if you find the corresponding instructions, please tell me. Clang generates almost the same.

% cat tofloats.c 
double ulonglong2double(unsigned long long a) {
    return a;
}
float ulonglong2float(unsigned long long a) {
    return a;
}
double uint2double(unsigned int a) {
    return a;
}
float uint2float(unsigned int a) {
    return a;
}
% gcc -msse4.2 -g -Os -c tofloats.c && objdump -d tofloats.o
00000000 <ulonglong2double>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 10                sub    $0x10,%esp
   6:   8b 55 0c                mov    0xc(%ebp),%edx
   9:   8b 45 08                mov    0x8(%ebp),%eax
   c:   89 55 f4                mov    %edx,-0xc(%ebp)
   f:   85 d2                   test   %edx,%edx
  11:   89 45 f0                mov    %eax,-0x10(%ebp)
  14:   df 6d f0                fildll -0x10(%ebp)
  17:   79 06                   jns    1f <ulonglong2double+0x1f>
  19:   d8 05 00 00 00 00       fadds  0x0
  1f:   dd 5d f8                fstpl  -0x8(%ebp)
  22:   dd 45 f8                fldl   -0x8(%ebp)
  25:   c9                      leave  
  26:   c3                      ret    

00000027 <ulonglong2float>:
  27:   55                      push   %ebp
  28:   89 e5                   mov    %esp,%ebp
  2a:   83 ec 10                sub    $0x10,%esp
  2d:   8b 55 0c                mov    0xc(%ebp),%edx
  30:   8b 45 08                mov    0x8(%ebp),%eax
  33:   89 55 f4                mov    %edx,-0xc(%ebp)
  36:   85 d2                   test   %edx,%edx
  38:   89 45 f0                mov    %eax,-0x10(%ebp)
  3b:   df 6d f0                fildll -0x10(%ebp)
  3e:   79 06                   jns    46 <ulonglong2float+0x1f>
  40:   d8 05 00 00 00 00       fadds  0x0
  46:   d9 5d fc                fstps  -0x4(%ebp)
  49:   d9 45 fc                flds   -0x4(%ebp)
  4c:   c9                      leave  
  4d:   c3                      ret    

0000004e <uint2double>:
  4e:   55                      push   %ebp
  4f:   89 e5                   mov    %esp,%ebp
  51:   83 ec 08                sub    $0x8,%esp
  54:   66 0f 6e 45 08          movd   0x8(%ebp),%xmm0
  59:   66 0f d6 45 f8          movq   %xmm0,-0x8(%ebp)
  5e:   df 6d f8                fildll -0x8(%ebp)
  61:   c9                      leave  
  62:   c3                      ret    

00000063 <uint2float>:
  63:   55                      push   %ebp
  64:   89 e5                   mov    %esp,%ebp
  66:   83 ec 08                sub    $0x8,%esp
  69:   66 0f 6e 45 08          movd   0x8(%ebp),%xmm0
  6e:   66 0f d6 45 f8          movq   %xmm0,-0x8(%ebp)
  73:   df 6d f8                fildll -0x8(%ebp)
  76:   c9                      leave  
  77:   c3                      ret

Here are the bonus points (conversion into ints):

% cat toints.c                                      
unsigned long long float2ulonglong(float a) {
    return a;
}
unsigned long long double2ulonglong(double a) {
    return a;
}
unsigned int float2uint(float a) {
    return a;
}
unsigned int double2uint(double a) {
    return a;
}
% gcc -msse4.2 -g -Os -c toints.c && objdump -d toints.o  
00000000 <float2ulonglong>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 0c                sub    $0xc,%esp
   7:   d9 45 08                flds   0x8(%ebp)
   a:   d9 05 00 00 00 00       flds   0x0
  10:   d9 c9                   fxch   %st(1)
  12:   db e9                   fucomi %st(1),%st
  14:   73 0d                   jae    23 <float2ulonglong+0x23>
  16:   dd d9                   fstp   %st(1)
  18:   dd 4d f0                fisttpll -0x10(%ebp)
  1b:   8b 45 f0                mov    -0x10(%ebp),%eax
  1e:   8b 55 f4                mov    -0xc(%ebp),%edx
  21:   eb 13                   jmp    36 <float2ulonglong+0x36>
  23:   de e1                   fsubp  %st,%st(1)
  25:   dd 4d f0                fisttpll -0x10(%ebp)
  28:   8b 55 f4                mov    -0xc(%ebp),%edx
  2b:   8b 45 f0                mov    -0x10(%ebp),%eax
  2e:   8d 8a 00 00 00 80       lea    -0x80000000(%edx),%ecx
  34:   89 ca                   mov    %ecx,%edx
  36:   83 c4 0c                add    $0xc,%esp
  39:   5b                      pop    %ebx
  3a:   5d                      pop    %ebp
  3b:   c3                      ret    

0000003c <double2ulonglong>:
  3c:   55                      push   %ebp
  3d:   89 e5                   mov    %esp,%ebp
  3f:   53                      push   %ebx
  40:   83 ec 0c                sub    $0xc,%esp
  43:   dd 45 08                fldl   0x8(%ebp)
  46:   d9 05 00 00 00 00       flds   0x0
  4c:   d9 c9                   fxch   %st(1)
  4e:   db e9                   fucomi %st(1),%st
  50:   73 0d                   jae    5f <double2ulonglong+0x23>
  52:   dd d9                   fstp   %st(1)
  54:   dd 4d f0                fisttpll -0x10(%ebp)
  57:   8b 45 f0                mov    -0x10(%ebp),%eax
  5a:   8b 55 f4                mov    -0xc(%ebp),%edx
  5d:   eb 13                   jmp    72 <double2ulonglong+0x36>
  5f:   de e1                   fsubp  %st,%st(1)
  61:   dd 4d f0                fisttpll -0x10(%ebp)
  64:   8b 55 f4                mov    -0xc(%ebp),%edx
  67:   8b 45 f0                mov    -0x10(%ebp),%eax
  6a:   8d 8a 00 00 00 80       lea    -0x80000000(%edx),%ecx
  70:   89 ca                   mov    %ecx,%edx
  72:   83 c4 0c                add    $0xc,%esp
  75:   5b                      pop    %ebx
  76:   5d                      pop    %ebp
  77:   c3                      ret    

00000078 <float2uint>:
  78:   55                      push   %ebp
  79:   89 e5                   mov    %esp,%ebp
  7b:   83 ec 08                sub    $0x8,%esp
  7e:   d9 45 08                flds   0x8(%ebp)
  81:   dd 4d f8                fisttpll -0x8(%ebp)
  84:   8b 45 f8                mov    -0x8(%ebp),%eax
  87:   c9                      leave  
  88:   c3                      ret    

00000089 <double2uint>:
  89:   55                      push   %ebp
  8a:   89 e5                   mov    %esp,%ebp
  8c:   83 ec 08                sub    $0x8,%esp
  8f:   dd 45 08                fldl   0x8(%ebp)
  92:   dd 4d f8                fisttpll -0x8(%ebp)
  95:   8b 45 f8                mov    -0x8(%ebp),%eax
  98:   c9                      leave  
  99:   c3                      ret

There functions take input from the stack and return it over the stack. If you need the result in an XMM register by the end of the function, you can use movd/movq to take them from the stack to the XMM. If the function is returning a double, your result is on -0x8(%ebp). If it's a float, result is in -0x4(%ebp). Ulonglongs have the lengths of doubles and ints have the lengths of floats.

fisttpll: Store Integer with Truncation

FISTTP converts the value in ST into a signed integer using truncation (chop) as rounding mode, transfers the result to the destination, and pop ST. FISTTP accepts word, short integer, and long integer destinations.

fucomi: Compare Floating Point Values and Set EFLAGS

Performs an unordered comparison of the contents of registers ST(0) and ST(i) and sets the status flags ZF, PF, and CF in the EFLAGS register according to the results (see the table below). The sign of zero is ignored for comparisons, so that –0.0 is equal to +0.0.

Interesting approach to the answer; however the question is to load an unsigned integer into the XMM register. — tgiphil, Jul 30 '12 at 06:19
I'll accept that answer; but is there a way to do this without any x87 FP registers? — tgiphil, Jul 30 '12 at 16:09
Note that `fisttp` was new in SSE3, even though it operates on x87 registers. https://www.felixcloutier.com/x86/fisttp. Without SSE3, you can use the current FP rounding mode (usually nearest) with `fistp`, instead of truncation. Or change the FP rounding mode and back like compilers had to do in the bad old days before SSE1/2 scalar math that included introduced truncating conversions like `cvttsd2si`. (To signed integer of a width limited to the mode, so no 64-bit integers in 32-bit mode, unlike x87) — Peter Cordes, Aug 09 '22 at 07:43
@tgiphil: Software floating point is possible but would be much slower. Perhaps also possible to FP multiply by `1.0 / 2^32` and use XMM FP -> integer conversion in two 32-bit halves, but that would also be slower than bouncing your data to x87 registers. Unless you have **AVX-512 for packed conversion to uint64_t** (https://www.felixcloutier.com/x86/vcvtpd2uqq), so `VCVTPD2UQQ xmm1, xmm1` / `vmovq [esp], xmm1` (or `movd` / `pextrd` two 32-bit halves directly to integer registers). — Peter Cordes, Aug 09 '22 at 07:45

Peter Cordes · Answer 3 · 2022-08-09T08:55:01.207

See also How to efficiently perform double/int64 conversions with SSE/AVX? for packed conversion tricks for limited-range or full-range. See also Strange uint32_t to float array conversion for analysis of compiler strategies for implementing it. Other answers on this question just show compiler-generated code without talking about why it works.

There are pretty much equivalent instructions for float/double to or from unsigned 64-bit integers with AVX-512. This answer will mostly look at double to uint64_t, but vcvtuqq2ps (packed uint64_t to packed single-precision) and similar instructions do exist, with vcvtusi2sd xmm1, xmm2, r/m64{er} only available in 64-bit mode. With the same inconvenient merge-into-some-register semantics from SSE1 instead of zero-extending into a fresh xmm.

`float` or `double` to `uint64_t` with AVX-512F

AVX-512F added support for FP to/from unsigned integers (scalar vcvttsd2usi or packed). And packed conversions to/from 64-bit integers (signed or unsigned, e.g. vcvttpd2uqq packed double or vcvttps2uqq converting float32 to uint64_t).

Before AVX-512, scalar conversion with 32-bit unsigned integers was easy in 64-bit mode or with x87, using zero-extension to a non-negative 64-bit signed integer. But 64-bit unsigned was a problem even in 64-bit mode.

vcvttsd2usi eax, xmm0 works in 32 or 64-bit mode, with AVX-512. (Or the ss version; float vs. double)

vcttvsd2usi rax, xmm0 of course works only in 64-bit mode with AVX-512. So instead we can use packed conversion, because being in 32-bit mode doesn't stop 64-bit SIMD integer elements from working.

I'm not sure if garbage in the high half representing a subnormal float/double could slow this down. I'd guess probably not, since rounding it to integer isn't different from a tiny normalized value.

;;; 32-bit mode can use packed 64-bit conversion then get the 2 halves
 vcvttpd2uqq  xmm1, xmm0    ; 2x truncating uint64_t from double conversions

 vmovd       eax, xmm1       ; extract the halves to integer registers
 vpextrd     edx, xmm1, 1    ; edx:eax = (uint64_t)xmm0[0]

Or store directly to memory with vmovq [esp], xmm1.

To use the current rounding mode instead of truncation, leave out the extra t in the mnemonic. (The default rounding mode is nearest with even as a tie-break, if you haven't changed MXCSR.)

vcvtpd2uqq  xmm1, xmm0       ; (uint64_t)nearbyint(xmm0)

Truncating and current-rounding-mode versions are available for all the new AVX-512 FP conversion instructions. This makes sense for the packed conversions; the EVEX rounding-mode override is only available for ZMMM vectors, not XMM/YMM.

I'm a bit surprised they bothered to make vcvttsd2usi with an opcode separate from vcvtsd2usi, instead of just an alias for VCVTSD2USI r64, xmm1/m64{rz-sae} to override the rounding mode to truncation toward Zero. (You can also override to Nearest, Upward, or Downward). That has the side effect of suppressing FP exceptions for that instruction, so perhaps they wanted to support code that detects inexact or overflowing conversions by checking MXCSR flags.

Unfortunately "trending sort" puts this answer ahead of the older answers that work with SSE or AVX, without AVX-512 which is unfortunately still rarely available, because Intel didn't bother to define a way for CPUs to provide the useful new operations without also implementing 512-bit vector width. Anyway, see the other answers for how to get compilers to generate working code for those cases. — Peter Cordes, Aug 10 '22 at 07:39

How to convert an unsigned integer to floating-point in x86 (32-bit) assembly?

3 Answers3

fisttpll: Store Integer with Truncation

fucomi: Compare Floating Point Values and Set EFLAGS

`float` or `double` to `uint64_t` with AVX-512F

Linked

Related

How to convert an unsigned integer to floating-point in x86 (32-bit) assembly?

3 Answers3

fisttpll: Store Integer with Truncation

fucomi: Compare Floating Point Values and Set EFLAGS

float or double to uint64_t with AVX-512F

Linked

Related

`float` or `double` to `uint64_t` with AVX-512F