When you do:
return 1. + 7.8*zeta;
The literals 1.
and 7.8
are double
, so when if zeta
is a float
, it will first be converted to double
, then the whole computation will be done in double-precision, and the result will be cast back to float
, this is equivalent to:
return (float)(1. + 7.8 * (double)zeta);
Otherwise said, this is equivalent to calling phih_stable(double)
and storing the result in a float
, so your template would be useless for float
.
If you want the computation to be made in single precision, you need the casts1:
return TF(1.) + TF(7.8) * zeta;
What about using 1.f
and 7.8f
? The problem is that (double)7.8f != 7.8
due to floating point precision. The difference is around 1e-7
, the actual stored value for 7.8f
(assuming 32-bits float
) is:
7.80000019073486328125
While the actual stored value for 7.8
(assuming 64-bits double
) is:
7.79999999999999982236431605997
So you have to ask yourself if you accept this loss of precision.
You can compare the two following implementations:
template <class T>
constexpr T phih_stable_cast(T t) {
return T(1l) + T(7.8l) * t;
}
template <class T>
constexpr T phih_stable_float(T t) {
return 1.f + 7.8f * t;
}
And the following assertions:
static_assert(phih_stable_cast(3.4f) == 1. + 7.8f * 3.4f, "");
static_assert(phih_stable_cast(3.4) == 1. + 7.8 * 3.4, "");
static_assert(phih_stable_cast(3.4l) == 1. + 7.8l * 3.4l, "");
static_assert(phih_stable_float(3.4f) == 1.f + 7.8f * 3.4f, "");
static_assert(phih_stable_float(3.4) == 1. + 7.8 * 3.4, "");
static_assert(phih_stable_float(3.4l) == 1.l + 7.8l * 3.4l, "");
The two last assertions fail due to the loss of precision when doing the computation.
1 You should even downcast from long double
to not lose precision when using your function with long double
: return TF(1.l) + TF(7.8l) * zeta;
.