map range of IEEE 32bit float [1:2) to some arbitrary [a:b)

Question

Back story : uniform PRNG with arbitrary endpoints

I've got a fast uniform pseudo random number generator that creates uniform float32 numbers in range [1:2) i.e. u : 1 <= u <= 2-eps. Unfortunately mapping the endpoints [1:2) to that of an arbitrary range [a:b) is non-trivial in floating point math. I'd like to exactly match the endpoints with a simple affine calculation.

Formally stated

I want to make an IEEE-754 32 bit floating point affine function f(x,a,b) for 1<=x<2 and arbitrary a,b that exactly maps 1 -> a and nextlower(2) -> nextlower(b)

where nextlower(q) is the next lower FP representable number (e.g. in C++ std::nextafter(float(q),float(q-1)))

What I've tried

The simple mapping f(x,a,b) = (x-1)*(b-a) + a always achieves the f(1) condition but sometimes fails the f(2) condition due to floating point rounding.

I've tried replacing the 1 with a free design parameter to cancel FP errors in the spirit of Kahan summation. i.e. with f(x,c0,c1,c2) = (x-c0)*c1 + c2 one mathematical solution is c0=1,c1=(b-a),c2=a (the simple mapping above), but the extra parameter lets me play around with constants c0,c1,c2 to match the endpoints. I'm not sure I understand the principles behind Kahan summation well enough to apply them to determine the parameters or even be confident a solution exists. It feels like I'm bumping around in the dark where others might've found the light already.

Aside: I'm fine assuming the following

a < b
both a and b are far from zero, i.e. OK to ignore subnormals
a and b are far enough apart (measuered in representable FP values) to mitigate non-uniform quantization and avoid degenerate cases

Update

I'm using a modified form of Chux's answer to avoid the division. While I'm not 100% certain my refactoring kept all the magic, it does still work in all my test cases.

float lerp12(float x,float a,float b)
{
    const float scale = 1.0000001f;
    // scale = 1/(nextlower(2) - 1);
    const float ascale = a*scale;
    const float bscale = nextlower(b)*scale;
    return (nextlower(2) - x)*ascale + (x - 1.0f)*bscale;
}

Note that only the last line (5 FLOPS) depends on x, so the others can be reused if (a,b) remain the same.

[1...2) has N floating point members. [a...b) has M members. Mapping N to M either is slightly non-uniform or some N members are ignored and a re-call of `PSRN()` is needed. What is the coding goal concerning this? — chux - Reinstate Monica, Jul 27 '21 at 16:11
Yes there will generally be a different number of representable values in the two ranges. This will lead to some non-uniformity. But I believe the effects are acceptably small. I'm mostly concerned with matching the endpoints. — Mark Borgerding, Jul 27 '21 at 17:05
When a and b are close together, nextlower(2) -> nextlower(b) is not going to fall naturally out of any floating-point computation in round-to-nearest mode (as an extreme example, consider a = nextlower (b)). Have you tried directed rounding modes? — njuffa, Jul 27 '21 at 17:27
@njuffa I'm not too concerned about mapping to ranges with so few quanta. re different rounding modes: I'm not opposed to that as part of the solution. — Mark Borgerding, Jul 27 '21 at 17:58
Mark, "map [1:2) to some arbitrary [a:b)" and "1 -> a and nextlower(2) -> nextlower(b)" are contradictory. Which one do you want? Assuming these are the same is the crux of the "sometimes fails the f(2) condition" problem. It is not really a FP rounding issue. — chux - Reinstate Monica, Jul 27 '21 at 20:02
Related: I give a possible algorithm at: https://peteroupc.github.io/randomfunc.html#For_Floating_Point_Number_Formats — Peter O., Jul 28 '21 at 00:58
Does this answer your question? [Random floating point double in Inclusive Range](https://stackoverflow.com/questions/9724404/random-floating-point-double-in-inclusive-range) — Peter O., Jul 28 '21 at 00:59

chux - Reinstate Monica · Accepted Answer · 2021-07-28T19:52:29.170

OP's goal

I want to make an IEEE-754 32 bit floating point affine function f(x,a,b) for 1<=x<2 and arbitrary a,b that exactly maps 1 -> a and nextlower(2) -> nextlower(b)

This differs slightly from "map range of IEEE 32bit float [1:2) to some arbitrary [a:b)".

General case

Map x0 to y0, x1 to y1 and various x in-between to y :

m = (y1 - y0)/(x1 - x0);
y = m*(x - x0) + y0;

OP's case

// x0 = 1.0f;
// x1 = nextafterf(2.0f, 1.0f);
// y0 = a;
// y1 = nextafterf(b, a);

#include <math.h>  // for nextafterf()

float x = random_number_1_to_almost_2();
float m = (nextafterf(b, a) - a)/(nextafterf(2.0f, 1.0f) - 1.0f);
float y = m*(x - 1.0f) + a;

nextafterf(2.0f, 1.0f) - 1.0f, x - 1.0f and nextafterf(b, a) are exact, incurring no calculation error.
nextafterf(2.0f, 1.0f) - 1.0f is a value a little less than 1.0f.

Recommendation

Other re-formations are possible with better symmetry and numerical stability at the end-points.

float x = random_number_1_to_almost_2();
float afactor = nextafterf(2.0f, 1.0f) - x;   // exact
float bfactor = x - 1.0f;                     // exact
float xwidth = nextafterf(2.0f, 1.0f) - 1.0f; // exact
// Do not re-order next line of code, perform 2 divisions
float y = (afactor/xwidth)*a + (bfactor/xwidth)*nextafterf(b, a);

Notice afactor/xwidth and bfactor/xwidth are both exactly 0.0 or 1.0 at the end-points, thus meeting "maps 1 -> a and nextlower(2) -> nextlower(b)". Extended precision not needed.

OP's (x-c0)*c1 + c2 has trouble as it divides (x-c0)*c1 by (2.0 - 1.0) or 1.0 (implied), when it should divide by nextafterf(2.0f, 1.0f) - 1.0f.

Thanks. I've added an "Update" section to the original question to show some minor modifications I made to your approach. Hopefully my speedups didn't break anything. — Mark Borgerding, Aug 02 '21 at 21:33
I do not certain the re-ordering of scale in your update will work for _all_ `a,b` - _maybe_ it will. — chux - Reinstate Monica, Aug 02 '21 at 22:39

njuffa · Answer 2 · 2021-07-28T09:28:15.450

Simple lerping based on fused multiply-add can reliably hit the endpoints for interpolation factors 0 and 1. For x in [1, 2) the interpolation factor x - 1 does not reach unity, which can be fixed by slight stretching by multiplying x-1 with (2.0f / nextlower(2.0f)). Obviously the endpoint needs to also be adjusted to the endpoint nextlower(b). For the C code below I have used the definition of nextlower() provided in the question, which may not be what asker desires, since for floating-point q sufficiently large in magnitude, q == (q - 1).

Asker stated in comments that it is understood that this kind of mapping is not going to result in an exactly uniform distribution of the pseudo-random numbers in the interval [a, b), only approximately so, and that pathological mappings may occur when a and b are extremely close together. I have not mathematically proved that the implementation of map() below guarantees the desired behavior, but it seems to do so for a large number of random test cases.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

float nextlowerf (float q)
{
    return nextafterf (q, q - 1);
}

float map (float a, float b, float x)
{
    float t = (x - 1.0f) * (2.0f / nextlowerf (2.0f));
    return fmaf (t, nextlowerf (b), fmaf (-t, a, a));
}

float uint32_as_float (uint32_t a)
{
    float r;
    memcpy (&r, &a, sizeof(r));
    return r;
}

// George Marsaglia's KISS PRNG, period 2**123. Newsgroup sci.math, 21 Jan 1999
// Bug fix: Greg Rose, "KISS: A Bit Too Simple" http://eprint.iacr.org/2011/007
static uint32_t kiss_z=362436069, kiss_w=521288629;
static uint32_t kiss_jsr=123456789, kiss_jcong=380116160;
#define znew (kiss_z=36969*(kiss_z&65535)+(kiss_z>>16))
#define wnew (kiss_w=18000*(kiss_w&65535)+(kiss_w>>16))
#define MWC  ((znew<<16)+wnew )
#define SHR3 (kiss_jsr^=(kiss_jsr<<13),kiss_jsr^=(kiss_jsr>>17), \
              kiss_jsr^=(kiss_jsr<<5))
#define CONG (kiss_jcong=69069*kiss_jcong+1234567)
#define KISS ((MWC^CONG)+SHR3)

int main (void)
{
    float a, b, x, r;
    float FP32_MIN_NORM = 0x1.000000p-126f;
    float FP32_MAX_NORM = 0x1.fffffep+127f;
    
    do {
        do {
            a = uint32_as_float (KISS);
        } while ((fabsf (a) < FP32_MIN_NORM) || (fabsf (a) > FP32_MAX_NORM) || isnan (a));
        do {
            b = uint32_as_float (KISS);
        } while ((fabsf (b) < FP32_MIN_NORM) || (fabsf (b) > FP32_MAX_NORM) || isnan (b) || (b < a));

        x = 1.0f;
        r = map (a, b, x);
        if (r != a) {
            printf ("lower bound failed: a=%12.6a  b=%12.6a  map=%12.6a\n", a, b, r);
            return EXIT_FAILURE;
        }

        x = nextlowerf (2.0f);
        r = map (a, b, x);
        if (r != nextlowerf (b)) {
            printf ("upper bound failed: a=%12.6a  b=%12.6a  map=%12.6a\n", a, b, r);
            return EXIT_FAILURE;
        }
    } while (1);
    return EXIT_SUCCESS;
}

map range of IEEE 32bit float [1:2) to some arbitrary [a:b)

Back story : uniform PRNG with arbitrary endpoints

Formally stated

What I've tried

Update

2 Answers2