I was examining some code which uses the /fp:precise
and /fp:fast
flags.
According to the MSDN documentation for /fp:precise
:
With /fp:precise on x86 processors, the compiler will perform rounding on variables of type float to the proper precision for assignments and casts and when passing parameters to a function. This rounding guarantees that the data does not retain any significance greater than the capacity of its type. A program compiled with /fp:precise can be slower and larger than one compiled without /fp:precise. /fp:precise disables intrinsics; the standard run-time library routines are used instead. For more information, see /Oi (Generate Intrinsic Functions).
Looking at the disassembly of a call to sqrtf
(called with /arch:SSE2
, target x86/Win32
platform):
0033185D cvtss2sd xmm0,xmm1
00331861 call __libm_sse2_sqrt_precise (0333370h)
00331866 cvtsd2ss xmm0,xmm0
From this question I believe modern x86/x64 processors don't use 80-bit registers (or at least discourage their use) so the compiler does what I would assume to be the next best thing and do calculations with 64-bit doubles. And because intrinsics are disabled, there's a call to a library sqrtf function.
Ok, fair enough this seems to comply with what the documentation says.
However, when I compile for the x64 arch, something strange happens:
000000013F2B199E movups xmm0,xmm1
000000013F2B19A1 sqrtps xmm1,xmm1
000000013F2B19A4 movups xmmword ptr [rcx+rax],xmm1
The calculations are not performed with 64-bit doubles, and intrinsics are being used. As far as I can tell, the results are exactly the same as if the /fp:fast
flag was used.
Why is there a discrepancy between the two? Does /fp:precise
simply not work with the x64 platform?
Now, as a sanity check I tested out the same code in VS2010 x86 with /fp:precise
and /arch:SSE2
. Surprisingly, the sqrtpd
intrinsic was being used!
00AF14C7 cvtps2pd xmm0,xmm0
00AF14CA sqrtsd xmm0,xmm0
00AF14CE cvtpd2ps xmm0,xmm0
What's going on here? Why does VS2010 use intrinsics while VS2012 calls a system library?
Testing VS2010 targeting the x64 platform has similar results as VS2012 (/fp:precise
appears to be ignored).
I don't have access to any older versions of VS so i can't do any testing on these platforms.
For reference I'm testing in Windows 7 64-bit with an Intel i5-m430 processor.