Using x87 from a kernel module will "work", but silently corrupts user-space x87 / MMX state. Why am I able to perform floating point operations inside a Linux kernel module?
You need kernel_fpu_begin()
/ kernel_fpu_end()
to make this safe.
Instead of loading/storing from inline asm, ask for input and produce output on the top of the x87 register stack and let the compiler emit load/store instructions if needed. The compiler already knows how to do that, you only need to use inline asm for the sqrt
instruction itself, which you can describe to the compiler this way:
static inline
float sqroot(float arg) {
asm("fsqrt" : "+t"(arg) );
return arg;
}
(See the compiler-generated asm for this on the Godbolt compiler explorer)
The register constraints have to tell the block to use the floating point registers.
Or avoid inline asm entirely, using a GNU C builtin that can inline
You need to use -fno-math-errno
for the builtin to actually inline as fsqrt
or sqrtss
, without a fallback to call sqrtf
for inputs that will result in NaN.
static inline
float sqroot_builtin(float arg) {
return __builtin_sqrtf(arg);
}
For x86-64, we get sqrtss %xmm0, %xmm0
/ ret
while for i386 we get fld
/ fsqrt
/ ret
. (See the Godbolt link above). And constant-propagation works through __builtin_sqrt
, and other optimizations.
EDIT: Incorporating @iwillnotexist-idontexist's point (re double loading).
Also, if it were me, I'd add static inline
to the declaration and put it in a header file. This will allow the compiler to more intelligently manage registers and avoid stack frame overheads.
(I'd also be tempted to change float
to double
throughout. Otherwise, you're discarding the additional precision that is used in the actual floating point instructions. Although if you will end up frequently storing the values as float
, there will be an additional cvtpd2ps
instruction. OTOH, if you're passing arguments to printf
, for example, this actually avoids a cvtps2pd
.)
But Linux kernel kprintf
doesn't have conversions for double
anyway.
If compiled with -mfpmath=387
(the default for 32-bit code), values will stay in 80-bit x87 registers after inlining. But yes, with 64-bit code using the 64-bit default of -mfpmath=sse
this would result in rounding off to float
when loading back into XMM registers.
kernel_fpu_begin()
saves the full FPU state, and avoiding SSE registers and only using x87 won't make it or the eventual FPU restore when returning to user-space any cheaper.