std::numeric_limits<uint32_t>::min()
is 0. Although removing the subtraction doesn't improve the generated assembly since it is known at compile time, it can simplify the function.
Another potential improvement is to calculate the complement of the divisor and use multiplication. You might think that optimiser would do that conversion automatically, but that's often not possible with floating point due to strict rules of IEEE-754.
Example:
return a * (1.0 / std::numeric_limits<uint32_t>::max());
Note that in the division used to calculate the complement, both operands are known at compile time, so the division is pre-calculated.
As you can see here, GCC does not do the optimisation automatically. It does if you use -ffast-math
at the cost of IEEE-754 conformance
I checked Agner Fog's instruction tables and randomly chose Zen3 architecture, and double division has about 3 times greater latency than multiplication.