Even though at the source level, everything is Single
(at least, once you make the constants 2000.0F
instead of 2000
), I'm pretty sure the issue is that this is being done in double precision inside the FPU. The code sequences are almost identical, but the difference is crucial:
If a / b > 2000.0F Then
fld dword ptr [a]
fdiv dword ptr [b]
fld dword ptr ds:[const 2000]
fcomip st, st(1)
(actual addresses swapped out for readability; the locals are coming off the stack, the literal is coming out of the data segment.)
Compare with:
c = a / b
If c > 2000.0F Then
fld dword ptr [a]
fdiv dword ptr [b]
fstp dword ptr [c]
fld dword ptr [c]
fld dword ptr ds:[const 2000]
fcomip st, st(1)
Note the round trip through memory there. One of the fun features of x87 floating point is that (possibly depending on the rounding modes you have set) it will only round to the defined precision when values are stored. It will work with full precision internally up until that point. I think that's what's happening here: the first sequence doesn't have an intervening store, so it takes place in extended precision, while the second sequence forces a rounding when c
gets written to memory.
It gets even more fun... this is in a debug build, which keeps the exact code sequence of the second step intact so that there will be an exact correspondence to the original source to aid in debugging. I'd lay very good odds that a release build with optimization turned up would end up with the first sequence, and you'd get the same results for both codes.
Addendum: I confirmed my supposition, in a release build, I get 2000 > 2000 twice.
Second addendum: I missed the part about CSng
the first time. That's even more fun: in a release build, the CSng
version is the only one that ends up with "2000 <= 2000".
In a debug build, the behavior is clear from the disassembly: the CSng
call results in an intermediate fstp
instruction, so it's the same as assignment to c
(the stack location just doesn't have an association with a variable name).
It's more interesting in a release build. In the first and third tests, the compiler/JITter is quite smart, and does the comparison at compile-time (it doesn't actually write the comparison call into the output, it only writes the branch it knows will be taken). The second test, however, turns out identically to the debug build, apparently due to the CSng
call.
Bruce Dawson has written about floating point in some detail. I would highly recommend reading his article about the internal precision in x87 floating point calculations, linked here: https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/
All of the above is the story for a 32-bit program, which will generally prefer to compile to x87 floating point. In 64-bit, everything goes through SSE/SSE2, which behaves according to IEEE floating point, and you get the results that you might expect with everything identical. The disassembled instruction sequence looks like this for the first two:
vmovss xmm0, dword ptr [a]
vdivss xmm0, xmm0, dword ptr [b]
vucomiss xmm0, dword ptr [const 2000]
For the third, it just adds a single vmovss dword ptr [c], xmm0
to the instruction sequence but it's otherwise identical (in particular, unlike the x87 sequence, it doesn't do a store/load pair, it just stores and keeps working with the enregistered value).