Back story : uniform PRNG with arbitrary endpoints
I've got a fast uniform pseudo random number generator that creates uniform float32 numbers in range [1:2) i.e. u : 1 <= u <= 2-eps
. Unfortunately mapping the endpoints [1:2) to that of an arbitrary range [a:b) is non-trivial in floating point math. I'd like to exactly match the endpoints with a simple affine calculation.
Formally stated
I want to make an IEEE-754 32 bit floating point affine function f(x,a,b)
for 1<=x<2
and arbitrary a,b that exactly maps
1 -> a
and nextlower(2) -> nextlower(b)
where nextlower(q)
is the next lower FP representable number (e.g. in C++ std::nextafter(float(q),float(q-1))
)
What I've tried
The simple mapping f(x,a,b) = (x-1)*(b-a) + a
always achieves the f(1) condition but sometimes fails the f(2) condition due to floating point rounding.
I've tried replacing the 1
with a free design parameter to cancel FP errors in the spirit of Kahan summation.
i.e. with
f(x,c0,c1,c2) = (x-c0)*c1 + c2
one mathematical solution is c0=1,c1=(b-a),c2=a
(the simple mapping above),
but the extra parameter lets me play around with constants c0,c1,c2
to match the endpoints. I'm not sure I understand the principles behind Kahan summation well enough to apply them to determine the parameters or even be confident a solution exists. It feels like I'm bumping around in the dark where others might've found the light already.
Aside: I'm fine assuming the following
- a < b
- both a and b are far from zero, i.e. OK to ignore subnormals
- a and b are far enough apart (measuered in representable FP values) to mitigate non-uniform quantization and avoid degenerate cases
Update
I'm using a modified form of Chux's answer to avoid the division. While I'm not 100% certain my refactoring kept all the magic, it does still work in all my test cases.
float lerp12(float x,float a,float b)
{
const float scale = 1.0000001f;
// scale = 1/(nextlower(2) - 1);
const float ascale = a*scale;
const float bscale = nextlower(b)*scale;
return (nextlower(2) - x)*ascale + (x - 1.0f)*bscale;
}
Note that only the last line (5 FLOPS) depends on x, so the others can be reused if (a,b) remain the same.