Largest value representable by a floating-point type smaller than 1

Question

Is there a way to obtain the greatest value representable by the floating-point type float which is smaller than 1.

static const double DoubleOneMinusEpsilon = 0x1.fffffffffffffp-1;
static const float FloatOneMinusEpsilon = 0x1.fffffep-1;

But is this really how we should define these values?

According to the Standard, std::numeric_limits<T>::epsilon is the machine epsilon, that is, the difference between 1.0 and the next value representable by the floating-point type T. But that doesn't necessarily mean that defining T(1) - std::numeric_limits<T>::epsilon would be better.

[`std::nextafter`](https://en.cppreference.com/w/cpp/numeric/math/nextafter)? — user17732522, Mar 07 '22 at 15:42
For IEEE-754, I believe you can subtract one from the integer representation of the float bits, which essentially appears to be what the hex literals are doing. https://gcc.godbolt.org/z/dechh5xxa — vandench, Mar 07 '22 at 15:42
Related to [whats-the-closest-double-to-1-0-that-isnt-1-0](https://stackoverflow.com/questions/38801885/whats-the-closest-double-to-1-0-that-isnt-1-0). — Jarod42, Mar 07 '22 at 16:23
@U.Windl That's assuming IEEE 754 binary format. Numbers between 0.5 (inclusive) and 1.0 (exclusive) are essentially represented by 0.5 (base 10) times 1.xxxxxxx (base 2), where each x is a 0 or a 1. Numbers between 1.0 (inclusive) and 2.0 (exclusive) are essentially represented by 1.0 times 1.xxxxxxx. The value of the unit of last place (ULP) is half that for numbers between 0.5 and 1.0 versus for numbers between 1.0 and 2.0. — David Hammen, Mar 08 '22 at 14:31

score 26 · Accepted Answer · edited Mar 08 '22 at 23:51

26

You can use the std::nextafter function, which, despite its name, can retrieve the next representable value that is arithmetically before a given starting point, by using an appropriate to argument. (Often -Infinity, 0, or +Infinity).

This works portably by definition of nextafter, regardless of what floating-point format your C++ implementation uses. (Binary vs. decimal, or width of mantissa aka significand, or anything else.)

Example: Retrieving the closest value less than 1 for the double type (on Windows, using the clang-cl compiler in Visual Studio 2019), the answer is different from the result of the 1 - ε calculation (which as discussed in comments, is incorrect for IEEE754 numbers; below any power of 2, representable numbers are twice as close together as above it):

#include <iostream>
#include <iomanip>
#include <cmath>
#include <limits>

int main()
{
    double naft = std::nextafter(1.0, 0.0);
    std::cout << std::fixed << std::setprecision(20);
    std::cout << naft << '\n';
    double neps = 1.0 - std::numeric_limits<double>::epsilon();
    std::cout << neps << '\n';
    return 0;
}

Output:

0.99999999999999988898
0.99999999999999977796

With different output formatting, this could print as 0x1.fffffffffffffp-1 and 0x1.ffffffffffffep-1 (1 - ε)

Note that, when using analogous techniques to determine the closest value that is greater than 1, then the nextafter(1.0, 10000.) call gives the same value as the 1 + ε calculation (1.00000000000000022204), as would be expected from the definition of ε.

Performance

C++23 requires std::nextafter to be constexpr, but currently only some compilers support that. GCC does do constant-propagation through it, but clang can't (Godbolt). If you want this to be as fast (with optimization enabled) as a literal constant like 0x1.fffffffffffffp-1; for systems where double is IEEE754 binary64, on some compilers you'll have to wait for that part of C++23 support. (It's likely that once compilers are able to do this, like GCC they'll optimize even without actually using -std=c++23.)

const double DoubleBelowOne = std::nextafter(1.0, 0.); at global scope will at worst run the function once at startup, defeating constant propagation where it's used, but otherwise performing about the same as FP literal constants when used with other runtime variables.

edited Mar 08 '22 at 23:51

Peter Cordes

328,167
45
605
847

answered Mar 07 '22 at 15:44

Adrian Mole

49,934
160
51
83

4

The result can differ from `1 - ε`, because epsilon is defined as the distance between `1.0` and the next _greater_ value. (It's easy to misunderstand or misuse `epsilon()`) – Drew Dormann Mar 07 '22 at 15:50
@DrewDormann Indeed, which is (partly) why I added the "footnote". – Adrian Mole Mar 07 '22 at 15:51
in this case is there any difference between `nextafter` and `nexttoward` ? I read the cppref but I don't get it. When I tried I got the same results – 463035818_is_not_an_ai Mar 07 '22 at 16:00
1

@463035818_is_not_a_number Yeah - the page is a bit vague on the difference between the two. The "C" version of the page looks a *bit* clearer, specifying what is converted to `long double` and when. – Adrian Mole Mar 07 '22 at 16:03
1

`std::numeric_limits::epsilon();` gives difference between 1.0 and next smallest value. So to get something greatest smaller then 1.0 half of epsilon should be extracted, since exponent for this value is smaller by one comparing to `1.0`. So it should be: `double neps = 1.0 - std::numeric_limits::epsilon() * 0.5;` – Marek R Mar 07 '22 at 16:07
2

@MarekR Yes, for IEEE-754 format, that would work. But the `nextafter` method will, *by definition*, give the correct answer for any system/architecture/type. Or it *should*. How it is implemented will no doubt depend on the system in use for the target architecture. – Adrian Mole Mar 07 '22 at 16:26
be aware that "fractional" floats must end with 5: N.xxx...5000... --so the above, displayed in decimal, is incorrect – Andrew Mar 07 '22 at 22:31
@Andrew Not at all sure what you mean. Are you saying that the output of the `cout` calls is wrong? – Adrian Mole Mar 07 '22 at 23:03
1

not sure about that; but I am sure 0.99999999999999977796 and 0.99999999999999988898 are not exact IEE floats :-) – Andrew Mar 07 '22 at 23:22
Please note that, as already mentioned, the corect values are expressed by [`1 - ε/2`](https://godbolt.org/z/bbeEc814s). – Bob__ Mar 08 '22 at 01:04
why is this marked as correct? it is not. – Andrew Mar 08 '22 at 11:42
@Andrew: What correctness problem do you see with `nextafterf`? How can it possibly return the wrong bit-pattern, unless it's buggy? The only problem I see is efficiency; GCC is able to do constant-propagation through it but clang isn't. https://godbolt.org/z/Thqb5KcM1 (even with `-ffast-math`, so I guess it just doesn't have it as a built-in). – Peter Cordes Mar 08 '22 at 11:50
4

@Andrew: If you're just complaining about the `0.99999999999999977796` text output of the test program, I assume that's enough significant digits to convert back to the right FP bit-pattern; of course it's not exactly representable; that's not what FP -> string functions aim for. They aim for bit-exact round-trip when read by a function like `strtod` that converts a string to double with correct rounding to nearest. (Which is [highly non-trivial](//www.exploringbinary.com/properties-of-the-correction-loop-in-david-gays-strtod/)). https://godbolt.org/z/9v1Gdf6zc confirms 0x1.fffffffffffffp-1 – Peter Cordes Mar 08 '22 at 11:58
1

@PeterCordes In terms of efficiency, note that, according to the linked cppreference, `std::nextafter` (and its cousins) is marked as `constexpr` since C++23. – Adrian Mole Mar 08 '22 at 11:59
So eventually this will be efficient in practice, once compilers get smart enough to support that C++23 requirement. But for now, even in C++ mode with `-std=gnu++23`, clang doesn't do const-prop: https://godbolt.org/z/rW41dvWEe . Applying `constexpr` to one of those test functions makes clang error that it never produces a constant, even with `-stdlib=libc++`. https://godbolt.org/z/YYfanaW3q – Peter Cordes Mar 08 '22 at 12:07
IEEE-754 is a fickle mistress. Thanks Adrian =) – Captain Giraffe Mar 08 '22 at 20:31

score 9 · Answer 2 · answered Mar 07 '22 at 16:37

9

This can be calculated without calling a function by using the characteristics of floating-point representation specified in the C standard. Since the epsilon provides the distance between representable numbers just above 1, and radix provides the base used to represent numbers, the distance between representable numbers just below one is epsilon divided by that base:

#include <iostream>
#include <limits>


int main(void)
{
    typedef float Float;

    std::cout << std::hexfloat <<
        1 - std::numeric_limits<Float>::epsilon() / std::numeric_limits<Float>::radix
        << '\n';
}

answered Mar 07 '22 at 16:37

Eric Postpischil

195,579
13
168
312

1

Nice! But, to be pedantic, `epsilon()` is actually "calling a function". :-) – Adrian Mole Mar 07 '22 at 21:24
@AdrianMole: Don’t make me replace it with `FLT_EPSILON`, `DBL_EPSILON`, etc. – Eric Postpischil Mar 07 '22 at 21:33
Hehe. :-P (Already shown my "approval", BTW, so no edit required.) – Adrian Mole Mar 07 '22 at 21:35

score -2 · Answer 3 · answered Mar 07 '22 at 22:56

-2

0.999999940395355224609375 is the largest 32 bit float that is less than 1. The code below demos this:

Mac_3.2.57$cat float2uintTest4.c 
#include <stdio.h>
int main(void){
    union{
        float f;
        unsigned int i;
    } u;
    //u.f=0.9999;
    //printf("as hex: %x\n", u.i); // 0x3f7fffff
    u.i=0x3f800000; // 1.0
    printf("as float: %200.200f\n", u.f);
    u.i=0x3f7fffff; // 1.0-e
          //00111111 01111111 11111111 11111111
          //seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
    printf("as float: %200.200f\n", u.f);

    return(0);
}
Mac_3.2.57$cc float2uintTest4.c 
Mac_3.2.57$./a.out 
as float: 1.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
as float: 0.99999994039535522460937500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

answered Mar 07 '22 at 22:56

Andrew

1
4
19

1

Assuming IEEE-754 representation, then you are printing to far more precision than can be represented (~ 7 or 8 decimal places) . Adjusting all the relevant parts of my answer to the `float` type will give outputs of `0.99999994039535522461` and `0.99999988079071044922`, respectively. – Adrian Mole Mar 07 '22 at 23:12
1

nothing personal; both wrong (see my earlier comment) :-) – Andrew Mar 07 '22 at 23:21
OK, I see now what you're trying to say. However, within the constraints of the specified output precision, the values printed in my comment above (and in my answer) are 'correct' (according to the specifications in the relevant Standards). Also, when you say "must end with 5", I think (to be *precise*) you meant "must end with 5 or 0". – Adrian Mole Mar 07 '22 at 23:51
2

The C++ standard allows implementations to use other formats for `float`, including formats that use a binary base but a different encoding than this answer assumes and formats that use a non-binary base. – Eric Postpischil Mar 08 '22 at 00:57
3

@AdrianMole: [“~ 7 or 8 decimal places” is not a correct model of how floating-point represents numbers.](https://stackoverflow.com/questions/61609276/how-to-calculate-float-type-precision-and-does-it-make-sense) Each floating-point value that is not an infinity or a NaN represents one finite number exactly, to infinitely many decimal places. There are IEEE-754 “single precision” numbers (the format often used for `float`) whose decimal numerals have 105 significant digits, and they are represented exactly. Treating them as approximations of numbers leads to programming errors. – Eric Postpischil Mar 08 '22 at 01:01
@EricPostpischil They are approximations of numbers, though. Floating point values aren't numbers in any mathematical sense; they do not obey any of the rules expected of such. 1.0/3.0 isn't an interesting value, and floating point "division" isn't an interesting function, except in that they are approximations to real numbers. Also, an exact number doesn't have significant digits; significant digits are inherently a property of imprecise numbers. – prosfilaes Mar 08 '22 at 01:18
1

@prosfilaes "Floating point values aren't numbers in any mathematical sense" that's a bold statement (I'd argue that they are a subset of **Q**). They *have* mathematical properties and a numerical theory can be derived from them. – Bob__ Mar 08 '22 at 01:29
@Bob__ Every thing can be studied mathematically, and thus has mathematical properties. But from a mathematical perspective, these things look little like the numbers mathematicians study. Addition is an associative operation that's commutative, and multiplication is an associative operation that may or may not be commutative, depending on the set and operation; a set with non-associative operations isn't number-like. They can be identified with a subset of Q, but so can any countable set; and addition and multiplication on FP are definitely not the same as the operations on Q confined to FP. – prosfilaes Mar 08 '22 at 01:43
5

@prosfilaes: No, they are not approximations of numbers. The IEEE-754 standard is clear on this: Each floating-point datum that is not an infinity or a NaN represents one real number exactly. The C and C++ standards say this too. With floating-point arithmetic, it is not the numbers that are approximations but the operations. The operations are defined to return the number that results from rounding the result of the corresponding real-number arithmetic operation to the nearest number representable in the floating-point format (with a choice of rounding rules). – Eric Postpischil Mar 08 '22 at 03:05
3

@prosfilaes: This model, in which the numbers are exact and the operations approximate real-number operations, is crucial to working with floating-point arithmetic: It is necessary to analyze floating-point arithmetic, to design floating-point arithmetic, to write proofs about floating-point arithmetic, and to debug floating-point arithmetic. – Eric Postpischil Mar 08 '22 at 03:06

Largest value representable by a floating-point type smaller than 1

3 Answers3

Performance

Linked

Related