The same header that provides the _MM_ROUND_UP
constants also defines _mm_setcsr(unsigned int i)
and _mm_getcsr(void)
intrinsic wrappers around the relevant instructions.
You should normally retrieve the old value, OR or ANDN the bit you want to change, then apply the new value. (e.g. mxcsr &= ~SOME_BITS
). You won't find many examples that just use LDMXCSR without doing a STMXCSR first.
Oh, I think you're actually doing that part wrong in your code. I haven't looked at how _MM_MASK_MASK
is defined, but its name includes the word MASK. You're ORing it with various other constants, instead of ANDing it. You're probably setting the MXCSR to the same value every time, because you're ORing everything with _MM_MASK_MASK
, which I assume has all the rounding-mode bits set.
As @StoryTeller points out, you don't need inline asm or intrinsics to change rounding modes, since the four rounding modes provided by x86 hardware match the four defined by fenv.h
in C99: (FE_DOWNWARD
, FE_TONEAREST
(the default), FE_TOWARDZERO
, and FE_UPWARD
), which you can set with fesetround(FE_DOWNWARD);
.
If you want to change rounding modes on the fly and make sure the optimizer doesn't reorder any FP ops to a place where the rounding mode was set differently, you need
#pragma STDC FENV_ACCESS ON
, but gcc doesn't support it. See also this gcc bug from 2008 which is still open: Optimization generates incorrect code with -frounding-math option (#pragma STDC FENV_ACCESS not implemented).
Doing it manually with asm volatile
still won't prevent CSE from thinking x/y
computed earlier is the same value, though, and not recomputing it after the asm statement. Unless you use x
or y
as a read-write operand for the asm statement that is never actually used. e.g.
asm volatile("" : "+g"(x)); // optimizer must not make any assumptions about x's value.
You could put the LDMXCSR inside that same inline-asm statement, to guarantee that the point where the rounding mode changed is also the point where the compiler treats x
as having changed.