7

I'm currently working on updating a large codebase from VS2013 to VS2019. One of the compiler errors I've run into is as follows:

intrinsics.h(348): error C3861: '_mm_cvtpd_pi32': identifier not found

This intrinsic function is defined in Visual Studio's "emmintrin.h". I only get this error when targeting 64-bit builds. On closer inspection is see that, between 2013 and 2019 the emmintrin.h definition changed from this:

extern __m64 _mm_cvtpd_pi32(__m128d _A);
extern __m64 _mm_cvttpd_pi32(__m128d _A);
extern __m128d _mm_cvtpi32_pd(__m64 _A);

To this:

#if defined(_M_IX86)
extern __m64 _mm_cvtpd_pi32(__m128d _A);
extern __m64 _mm_cvttpd_pi32(__m128d _A);
extern __m128d _mm_cvtpi32_pd(__m64 _A);
#endif

ie: The preprocessor directive ensures that the functions are now only available for 32bit targets. The 3rd party header file from which the error originates makes use of these functions regardless of the target (64bit or 32bit). Presumably the best course of action here is to edit this header file to ensure that this function is only called upon for 32-bit targets. However what I'm more curious about is why was this changed from 2013 to 2019? I see a description of this function here:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtpd_pi32&expand=1705

Was it never applicable to 64bit targets to begin with? Or has it been replaced with a 64bit version that I need to consider?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Nimo
  • 324
  • 2
  • 15
  • 4
    They decided to no longer support x87 registers for 64-bit code. Formally documented [here](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/using-floating-point-or-mmx-in-a-wdm-driver). As yet it is only hard for WDM code, but worth not ignoring. Also [look here](https://stackoverflow.com/questions/39503088/how-to-efficiently-convert-from-two-m128d-to-one-m128i-in-msvc). – Hans Passant Mar 30 '20 at 15:36
  • 1
    @HansPassant: No reason to expect that those restrictions on kernel code would ever apply to user-space. It's totally normal for kernels to restrict themselves to only integer code in general, or require special stuff before you can use it. (e.g. like Linux's `kernel_fpu_begin` so the kernel doesn't have to save/restore FPU/SIMD state for interrupt handlers / system calls.) But for exactly those reasons (kernels being careful to save/restore *user-space* FPU/SIMD state including x87), user-space machine code can expect it to continue being possible to use MMX / x87 like current Windows. – Peter Cordes Mar 31 '20 at 11:39
  • 1
    Basically my point is that there are good reasons for not supporting x87/MMX *in the kernel*, even if you have no plans to eventually drop user-space support. So that inference you seem to be making doesn't follow. Whether certain compilers choose to support MMX intrinsics or not is a separate matter. Other windows compilers like GCC and clang still support MMX intrinsics, and have options that make it possible to use `long double` = 80-bit x87. Some popular projects like x264 and FFmpeg still (unfortunately) make some use of MMX in hand-written asm. – Peter Cordes Mar 31 '20 at 11:43

1 Answers1

5

I don't know if there's a way to get MSVC 2019 to compile this legacy MMX intrinsic.

It is safe to use MMX instructions in 64-bit code on Windows, but MS doesn't make it easy to build such code using MS compilers. The intrinsic might not be supported by newer MSVC; use a better compiler (like clang) if you need to compile old code with MMX intrinsics if there's no workaround for MSVC.

(Early in the history of x86-64 and 64-bit Windows, the fact that MS removed some compiler or assembler support for MMX got some people worried that maybe the Windows kernel wouldn't properly do context-switching for the x87/MMX state. That doubt was unfounded. If you can get MMX code to compile/assemble, e.g. with other tools, it will still run perfectly fine. Windows supports it, and x86-64 CPUs in long mode do still have full support for MMX. I don't use Windows and I don't remember exactly what kind of MMX support was removed.)


Of course normally it's better to use SSE2 instead of MMX, i.e. the epi32 instrinsics instead of pi32 (or whatever other integer element width). SSE2 is baseline for x86-64, and also required for double-precision SIMD (including that conversion intrinsic).

The use-case for that conversion is (I think) mainly to get MMX integer vectors for use with existing legacy MMX-vectorized code.

But in this specific case cvtpd2pi is actually not slower than cvtpd2qd (the normal SSE2 _mm_cvtpd_epi32) - both are 2 uops, I think because even within the XMM register domain it has to shuffle the 32-bit integers to the bottom. https://www.uops.info/table.html. Unlike the ps version where FP->int conversion between XMM registers is single-uop.

MMX instructions have worse throughput than the equivalent SSE2/3 instructions on recent CPUs (running on fewer ports), and mov-elimination doesn't work on them.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847