5

I have a __m256 value that holds random bits.

I would like to to "interpret" it, to obtain another __m256 that holds float values in a uniform [0.0f, 1.0f] range.

Planning to do it using:

__m256 randomBits = /* generated random bits, uniformly distribution */;
__m256 invFloatRange =  _mm256_set1_ps( numeric_limits<float>::min() ); //min is a smallest increment of float precision

__m256 float01 =  _mm256_mul(randomBits, invFloatRange);
//float01 is now ready to be used

Question 1:

However, will this cause a problem in very rare cases where randomBits has all bits as 1 and is therefore NAN?

What can I do to protect myself from this?

I want the float01 to always be a usable number

Question 2:

Will the [0 to 1] range remain uniform after I obtain it using the above approach? I know float has varying precision at different magnitudes

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Kari
  • 1,244
  • 1
  • 13
  • 27
  • 4
    Treat `randomBits` as unit32 then divide by uint32 max (making sure to convert to float first)? Random bits in a floating point number won't give a uniform distribution even without the problems of nan and infinity – Alan Birtles Dec 31 '20 at 08:40
  • @AlanBirtles could you please show how it would be done using `_mm256` instructions? uint32 would have a different range (than float) from what I can see. Maybe we should use int32 and mask-away the minus sign? This should also eliminate any possibility of NaN occuring – Kari Dec 31 '20 at 10:30
  • There is no direct conversion from `uint32` to `float`, but you can convert `int32` to `float` using `_mm256_cvtepi32_ps`, then multiply by `pow(2,-32)` and add `0.5` (using FMA, if available). This won't be perfect, especially the smallest non-zero result will be `pow(2,-23)`. – chtz Dec 31 '20 at 15:01
  • 1
    Actually, it might be slightly better, to scale by `pow(2,-31)` (this gets numbers in `[-1, +1)`) and then mask away the sign bit. You will only lose 1 bit of the generated number, instead of 8. – chtz Dec 31 '20 at 15:05
  • 1
    For uint32 to float conversion see [here](https://stackoverflow.com/q/34066228/2439725). – wim Dec 31 '20 at 15:07
  • 2
    @Kari Have you seen this? https://stackoverflow.com/q/54869672/126995 – Soonts Jan 01 '21 at 02:56

2 Answers2

4

Reinterpreting an int32_t as float, one can

 auto const one = _mm256_set1_epi32(0x7f800000);
 a = _mm256_and_si256(a, _mm256_set1_epi32(0x007fffff));
 a = _mm256_or_si256(a, one);
 return _mm256_sub_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(one));

The and/or sequence will reuse the 23 LSBs of the input sequence to produce a uniform distribution of values between 1.0f <= a < 2.0f. And then the bias of 1.0f is removed.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • This will never generate a float smaller than 1 epsilon. If you somehow used 31 or 32 bits of randomness, (e.g. with uint or int->float conversion and then multiplying by 2^-31), you'd get rounding to nearest multiple of e.g. 32 for large floats, but each small random integer can still map to a different small float, so you have more than 2^24 possible results, but still *I think* uniformly distributed. – Peter Cordes Apr 25 '21 at 00:26
  • Hmm, not sure that idea is flawless, either. http://mumble.net/~campbell/2014/04/28/uniform-random-float – Peter Cordes Apr 25 '21 at 00:31
3

As @Soonts has pointed out, floats can be created uniformly in [0, 1] range:

https://stackoverflow.com/a/54873925/9007125

I ended up using the answer below:

https://stackoverflow.com/a/54893167/9007125

//converts __m256i values into __m256 values, that contains floats in [0,1] range.
//https://stackoverflow.com/a/54893167/9007125
inline void int_rand_int_toFloat01( const __m256i* m256i_vals,  
                                          __m256* m256f_vals){ //<-- stores here.
    const static __m256 c =  _mm256_set1_ps(0x1.0p-24f); // or (1.0f / (uint32_t(1) << 24));

    __m256i* rnd =   ((__m256i*)m256i_vals);
    __m256* output =  ((__m256*)m256f_vals);

    // remember that '_mm256_cvtepi32_ps' will convert 32-bit ints into a 32-bit floats
    __m256 converted =  _mm256_cvtepi32_ps(_mm256_srli_epi32(*rnd, 8));
             *output =  _mm256_mul_ps( converted, c);
}
Kari
  • 1,244
  • 1
  • 13
  • 27
  • 1
    You don't need to pointer-cast your function args; they already have the same type you're assigning to. If you did want to reinterpret one vector type as another, use `_mm256_castps_si256` or whatever. (Although pointer-casting is safe for this, but only for intrinsic types like `__m256i`, not `int`, because intrinsic types are like `char*` and can legally alias anything regardless of strict-aliasing rules - in GCC they're defined with `__attribute__((may_alias))`. – Peter Cordes Jan 05 '21 at 01:04