2

I was looking at how Numpy implements the random module and saw the following function to generate a float32 from a random uint32:

static NPY_INLINE float next_float(bitgen_t *bitgen_state) {
  return (next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f);
}

I don't get why they multiply by (1.0f / 16777216.0f) here, instead of simply dividing by 16777216.0f.

Edit: As we can see from compiling the two ways of writing this function, there seems to be no difference in the generated code. So, this does not seem to be a case of "float multiplication is faster than float division".

XYZT
  • 605
  • 5
  • 17
  • Related: https://stackoverflow.com/questions/17883240/is-multiplication-faster-than-float-division/17883319?r=SearchResults&s=6|0.0000#17883319 – Peter O. Jan 05 '22 at 22:55
  • 3
    Given that `(1.0f / 16777216.0f)` is a compile-time constant, it could be that the multiplication has the potential to be faster than an equivalent division. – khelwood Jan 05 '22 at 22:56
  • 2
    `16777216.0f` is `2^24` which is precision limit of float. – Olvin Roght Jan 05 '22 at 22:57
  • Probably they did a benchmark on the speed of division versus multiplication, and multiplication won. `1.0f / 16777216.0f` is calculated at compile time. – the busybee Jan 05 '22 at 23:01
  • Here's some current evidence that it's not very important to write it that way now: https://godbolt.org/z/9rd47a9Kr (but avoiding divisions is standard practice) – harold Jan 05 '22 at 23:02
  • Does this answer your question? [Is multiplication faster than float division?](https://stackoverflow.com/questions/17883240/is-multiplication-faster-than-float-division) – the busybee Jan 05 '22 at 23:03
  • @thebusybee it shouldn't matter much, because generating pseudo-random numbers is already slow. I doubt you'd be able to see any difference. – Mark Ransom Jan 05 '22 at 23:05
  • @MarkRansom: Depends on the PRNG algorithm. You can implement Lehmer’s generator with just two operations (integer multiply and bitshift) for each state transition. – dan04 Jan 05 '22 at 23:36
  • If you want to know the true reason, you will have to ask the creator of NumPy or [file an issue on its issues page](https://github.com/numpy/numpy/issues). Any other answer would be speculation. – Peter O. Jan 06 '22 at 00:19
  • @dan04 but what is the quality of randomness with Lehmer's? Given that it's a variation of LCG I'm not hopeful. Usually a pseudo-random algorithm can give you speed or quality, but not both. – Mark Ransom Jan 07 '22 at 02:12

1 Answers1

3

Because it is faster to multiply than divide in most CPUs.

(1.0f / 16777216.0f) is converted into a constant during compilation and then the computer just needs to use the multiplication in runtime.

In C++, the compiler can be set to do this optimization automatically with --ffast-math flag, without the need to insert x*(1/y) in the code. However, it may not be safe since the result is not the same as simply dividing (due to rounding errors). By explicitly adding x*(1/y) you are manually doing what the compiler would do with this flag.

Side note: As pointed by harold, if the division result can be exactly represented in float, the compiler may do this optimization automatically, even without --ffast-math.

Adriel Jr
  • 2,451
  • 19
  • 25
  • I am not that familiar with the intricacies of C, but why doesn't `x / 16777216.0f` lead to a compiler optimization to convert the division to a multiplication in this way? – XYZT Jan 05 '22 at 23:03
  • @XYZT: Because floating-point math doesn't guarantee that `(1.0 / x) * x == 1.0`, so to be safe, the C(++) compiler doesn't make optimizations that would assume that property. – dan04 Jan 05 '22 at 23:07
  • 3
    @dan04 but it does guarantee that when `x` is 16777216, and compilers evidently [do use that](https://godbolt.org/z/9rd47a9Kr), without `-ffast-math` – harold Jan 05 '22 at 23:10
  • @XYZT there are 2 operations (* and /) instead of just 1 (/), so 2x more rounding errors. These are almost always present when operating with floating point numbers. Imagine floats as irrational numbers where we cannot represent perfectly using not decimal, but binary notation. The more we operate, the more errors we insert. But for most uses it is usually not a problem, the developer should know when it is safe to do that. – Adriel Jr Jan 05 '22 at 23:17
  • @harold thanks, I just did not express myself very well. I rewrote my answer to make it more clear. – Adriel Jr Jan 05 '22 at 23:23
  • 2
    @AdrielJr ok but maybe I didn't express myself very well either, because the flag isn't actually needed. The compiler (well some of them at least, it's not like I tried *all* of them) can do it either way, with the fast math flag but also without it. The flag would be necessary if we were dividing by 10 or something like that where there is no accurate reciprocal, but 16777216 has an accurate reciprocal. – harold Jan 05 '22 at 23:29
  • @harold I tried [the C compiler instead](https://godbolt.org/z/nrEaf53rW) as it seems Numpy uses C instead of C++ and it generates a significantly longer number of assembly instructions (albeit identical regardless of multiplication or division). Would you happen to know why the C compilation is so much longer? – XYZT Jan 05 '22 at 23:31
  • 1
    @XYZT it's not a C thing, you just didn't put the optimization flag (eg `-O2`) so you're looking at "intentionally slow" code. Also remove `static` otherwise everything disappears when optimized. Like this: https://godbolt.org/z/zeP343KrY – harold Jan 05 '22 at 23:32
  • 1
    @harold thanks. Well, this seems to imply that `(1 / x)` and `/ x` in this specific case has no difference at all! So, now I am wondering if this is just some legacy coding style from a belief that multiplication is better even though, there is no difference in practice. – XYZT Jan 05 '22 at 23:38
  • 1
    @XYZT the multiply will always be faster. Harold has determined that the optimizer will often replace the divide with a multiply, but would you rely on that if you didn't have to? – Mark Ransom Jan 06 '22 at 00:54