Can floating point modulo by power of 2 be optimized?

Question

Background:

Taking the modulo by a power of 2 of an integer is much faster than by a non-power of 2 due to the binary representation of numbers. All that is needed is to mask away the higher order bits which takes a single and instruction.

Example in C:

unsigned x;
unsigned xmod16 = x&15; // Fast

Vs.

unsigned x;
unsigned xmod17 = x%17; // Slow

Question:

Floating point numbers are represented differently than integers. But they are still based on a binary significand and exponent. Is there any way to exploit the fact that floating point is binary to optimize modulo by power of 2 operations on floating point?

float x;
float xmod17 = fmodf(x, 17); // Slow
float xmod16 = fmodf(x, 16); // Also Slow

Vs.

union BitCast {
    float f;
    unsigned i;
} x;
float xmod17 = fmodf(x.f, 17); // Slow
// Magic
float xmod16; // Fast

There are many different floating point representations out there but for the purpose of this question I am referring specifically to the relatively standard IEEE 754 format.

If there is a clever yet portable trick that would be awesome but less portable solutions that apply specifically to IEEE 754 are welcome too.

Probably - but more complicated. Note that you don't have to worry about these tricks on modern desktop or laptop computers any more, because compilers are good and CPUs are fast. They can be relevant on microcontrollers. — user253751, Mar 31 '23 at 16:47
Whether it can be optimized probably depends on the exponent. — Barmar, Mar 31 '23 at 16:48
Can you take advantage from the fractional part will remain unchanged? — Weather Vane, Mar 31 '23 at 16:52
There is a related question here: https://stackoverflow.com/questions/73351257/efficient-floating-point-modulus-one-in-c — nielsen, Mar 31 '23 at 16:57
[This answer](https://stackoverflow.com/a/49140112/4142924) discusses taking the modulo when it is a power of the floating point radix. — Weather Vane, Mar 31 '23 at 17:00
Especially, the extra instructions needed to do the floating point bit twiddling are probably slower than just doing the floating-point modulo on any processor that has a floating-point unit. — user253751, Mar 31 '23 at 17:11
@user253751: The full general-case `fmod` is quite slow. The fast way for a power of 2 is a multiply by `1.0/m`, truncate that, and an FMA (or better `fmsub` to fold in the negation), as shown in [Are there any numbers that enable fast modulo calculation on floats?](https://stackoverflow.com/q/49139283). — Peter Cordes, Mar 31 '23 at 17:29

Can floating point modulo by power of 2 be optimized?

Background:

Question:

0 Answers0