Intel application: how to transition from integer to floating point mode

Question

I know that when an application wants to execute a floating point operation an Intel processor has to be configured to work in floating point mode. Is not too expensive to change mode every time I need to perform a floating point operation? Who is responsible to do this "mode change": the compiler or the OS? Why the FPU is not always ready to work considering that floating point registers are separated from integer ones?

This is not generally the case on modern Intel processors. There is no separate mode for floating-point work. Why do you think there is? Where did you read this? Exactly what did it say? There was exclusion between FPU and MMX operations, but that is dated now. Is that what you were thinking of? — Eric Postpischil, Oct 21 '21 at 11:07
I read it in the book "Linux kernel development 3rd edition": "When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating point instructions varies by architecture, but the kernel normally catches a trap and then initiates the transition from integer to floating point mode." — a.dibacco, Oct 21 '21 at 14:37
Possibly this helps: https://stackoverflow.com/questions/13886338/use-of-floating-point-in-the-linux-kernel — chtz, Oct 21 '21 at 15:24
In the technological dark ages, people worried about wasted time when a purely integer-register using process was switched out when process switching. To avoid that overhead, the floating-point units of many processor architectures have a bit to memorize whether the FPU has been used since the last time the bit was explicitly cleared, so the OS can check whether it needs to preserve the FPU state. This feature hasn't been used in modern OSs, since it's not worth worrying about the overhead anymore. Also, at least on some Intel microarchitectures, it's a speculative execution vulnerability. — EOF, Oct 21 '21 at 19:40
@EOF: What would be an example of "floating-point units of many processor architectures have a bit to memorize whether the FPU has been used since the last time the bit was explicitly cleared"? I have used x86, Arm, SPARC, and PowerPC, and have no recollection of such a status bit in their respective FPUs. — njuffa, Oct 21 '21 at 20:40
Okay, so the book is talking about a flag enabling floating-point operations for the current process. I suppose that could be called a mode, but the book states it inelegantly. The idea is that, if a process does not use floating-point operations (including loads and stores), it has no data in the floating-point registers that needs to be saved and restored. So, when a process first starts, the kernel marks it as “not allowed” to use the floating-point registers. If a process never uses floating-point operation, it goes on its merry way… — Eric Postpischil, Oct 21 '21 at 22:29
… The first time it uses a floating-point operation, the processor generates a trap, the kernel says, oh, you are using floating-point, and it makes a note to itself about that and enables floating-point operations for that process. Also, it saves any data currently in those registers from a previous process. Every time a process that is using the floating-point registers is scheduled to run and it is not the process that most recently used the floating-point registers, the processor saves the current contents of the registers and restores the new process’s data. — Eric Postpischil, Oct 21 '21 at 22:31
Note that one consequence of that is that the kernel can interrupt a process using floating-point and run a dozen other processes that are not without saving and restoring the floating-point registers. So switching between processes can be faster than if the registers have to be saved for all processes. — Eric Postpischil, Oct 21 '21 at 22:32
Are you talking about `emms` after MMX integer-SIMD? That's one of several reasons why MMX is almost never used, instead using SSE2 that doesn't have that problem. (Also x87 itself is almost never used, again in favour of scalar operations in SSE2 registers, which don't need a mode switch.) — Peter Cordes, Oct 22 '21 at 05:01
Now that most processes use XMM regs all over the place, e.g. even for memcpy, an "eager" FPU save strategy on context switches makes more sense. (Especially as load/store throughput has improved relative to interrupt overhead.) Previously Linux used "lazy" FPU saving, setting a bit in the control registers for x87 and SSE units so any such instruction would fault. That lets the kernel step in and save the old FP regs before resuming the current task with its restored FP registers. \@njuffa, I'm pretty sure that's what @EOF was talking about. — Peter Cordes, Oct 22 '21 at 05:02

score 1 · Answer 1 · answered Oct 22 '21 at 05:17

It's not an integer vs. FP mode, there are just x87 and SSE control register bits that make any FP instruction fault. (or MMX or integer SIMD in XMM regs).

Now that most processes use XMM regs all over the place, e.g. even for memcpy, an "eager" FPU save strategy on context switches makes more sense, where the FP/SIMD registers are always saved/restored before switching to a new user-space task. (Especially as load/store throughput has improved relative to interrupt overhead.) Entering the kernel for a system-call or interrupt still only saves the GP integer registers, but the kernel avoids using FP instructions itself without triggering a save of possibly-dirty user-space state. (So it's expensive and only a few cases like RAID5 / RAID6 and crypto drivers do it.)

Older Linux kernels used "lazy" FPU saving, clearing those bits on context switch so any such instruction would fault, in the hope that the new task being switched to would end its timeslice without ever having faulted. In that case, the FP/SIMD registers could stay untouched, with the values of whichever user-space task actually last ran any FP/SIMD instructions.

That fault mechanism lets the kernel step in and save the old FP regs before resuming the current task with its restored FP registers. But once the FP/SIMD context switching has been done, later instructions can execute without faulting during the rest of its timeslice.

Eager save/restore became the default several years ago, and "lazy" mode has since been entirely removed from the Linux source tree. Again, because interrupts are quite expensive on deeply pipelined OoO exec CPUs, and because it's not that expensive to save/restore the FP regs given current cache/memory bandwidth.

BTW, your description in the question (other than your comment) sounded like it might be about needing emms after MMX integer-SIMD before using x87 instructions.

That's one of several reasons why MMX is almost never used, instead using SSE2 that doesn't have that problem. (Also x87 itself is almost never used in 64-bit code, again in favour of scalar operations in SSE2 registers with a flat register file instead of a stack.)

Intel application: how to transition from integer to floating point mode

1 Answers1