3

I've been going back through my C++ book, and I came across a statement that says zero can be represented exactly as a floating-point number. I was wondering how this is possible unless the value of 0.0 is stored as a type other than a floating point value. I wrote the following code to test this:

#include <iomanip>
#include <iostream>

int main()
{
    float value1 {0.0};
    float value2 {0.1};

    std::cout << std::setprecision(10) << std::fixed;

    std::cout << value1 << '\n' 
              << value2 << std::endl;
}

Running this code gave the following output:

0.0000000000
0.1000000015

To 10 digits of precision, 0.0 is still 0, and 0.1 has some inaccuracies (which is to be expected). Is a value of 0.0 different from other floating point numbers in the way it is represented, and is this a feature of the compiler or the computer's architecture?

crdrisko
  • 454
  • 2
  • 5
  • 14
  • 6
    `0` is just one of *many* decimal numbers that can be represented accurately. – cigien Apr 16 '20 at 21:06
  • Every floating point value is stored differently than any other floating point value. There are values that can be represented exactly by floating point. Not every value is impossible to represent. – JohnFilleau Apr 16 '20 at 21:06
  • 1
    For example, all integers can be represented exactly without introducing fractional errors, and so can negative powers of 2 (such as 1/16, 1/2048). – Ben Voigt Apr 16 '20 at 21:07
  • I'm afraid I'm still not quite understanding, what is it about these values (0, 1/16, 1/2048, ...) that allows them to be represented exactly. It was my (perhaps naive) understanding that all floating point values contained some instability. – crdrisko Apr 16 '20 at 21:11
  • 1
    Does this help?: https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate and https://stackoverflow.com/questions/1089018/why-cant-decimal-numbers-be-represented-exactly-in-binary – walnut Apr 16 '20 at 21:12
  • 1
    @crdrisko Floating point values are not unstable. Rather, representing `1/10` in floating point is much like representing `1/3` in decimal. You can write `0.333333`, but that is not quite `1/3`. Once you run out of room for digits, you need to round. – JaMiT Apr 16 '20 at 21:13
  • Floating point numbers are not nondeterministic or anything like that. It is just that the number of real numbers between any two distinct numbers is infinite, but there are only finitely many states that a finite-size variable could hold, so when implementing floating point numbers one needs to make choice which subsets of these real numbers can be represented. In practice floating point numbers are stored in a binary positional systems, which is why the power-of-2s are exactly representable. – walnut Apr 16 '20 at 21:14
  • Some values can be exactly represented by floating point numbers because they fall exactly on the fractional values, but most (because there are so many) can only be approximated. There is also the practical matter of computer architecture that could introduce some noise even on exactly representable values on some machines. (This is not C++-specific.) See [these](https://isocpp.org/wiki/faq/newbie#floating-point-arith) [two](https://isocpp.org/wiki/faq/newbie#floating-point-arith2) FAQs, and note in particular the part that says "Wow." – metal Apr 16 '20 at 21:17
  • All bits of float are set to zero for +0.0f. There is also -0.0f that has only sign bit set and all the other bits are still zero. – DevO Apr 16 '20 at 21:20
  • @BenVoigt: It is not true that all integers can be represented exactly, at least as a floating-point type that has the same size as the integer type. You can see this by using a counting argument: the number of possible bit representations is the same in the two cases, but the floating-point type represents non-integral values as well. – TonyK Apr 16 '20 at 21:22
  • Thank you everyone for the responses, and for those additional links, looks like I have some more research to do on floating point arithmetic! Sounds like the answer involves looking at the problem from binary instead of decimal representation. If anyone wants to take a stab at an answer, I'd be happy to accept it. – crdrisko Apr 16 '20 at 21:29
  • @TonyK: Yes but the rounding error on integers is also integer, you don't end up with a weird decimal representation some dozen digits later than the number you entered. Hence my specification of "no fractional error". – Ben Voigt Apr 16 '20 at 22:14
  • @BenVoigt: "all integers can be represented exactly" does not admit of misinterpretation. – TonyK Apr 16 '20 at 23:56
  • @TonyK Taking words out of context very often leads to misinterpretation. – Ben Voigt Apr 17 '20 at 15:22

2 Answers2

4

How can 2 be represented as an exact number? 4? 15? 0.5? The answer is just that some numbers can be represented exactly in the floating-point format (which is based on base-2/binary) and others can't.

This is no different from in decimal. You can't represent 1/3 exactly in decimal, but that doesn't mean you can't represent 0.

Zero is special in a way, because (like the other real numbers) it's more trivial to prove this property than for some arbitrary fractional number. But that's about it.

So:

what is it about these values (0, 1/16, 1/2048, ...) that allows them to be represented exactly.

Simple mathematics. In any given base, in the sort of representation we're talking about, some numbers can be written out with a fixed number of decimal places; others can't. That's it.

You can play online with H. Schmidt's IEEE-754 Floating Point Converter for different numbers to see a bunch of different representations, and what errors come about as a result of encoding into those representations. For starters, try 0.5, 0.2 and 0.1.

It was my (perhaps naive) understanding that all floating point values contained some instability.

No, absolutely not.

You want to treat every floating point value in your program as potentially having some small error on it, because you generally don't know what sequence of calculations led to it. You can't trust it, in general. I expect someone half-taught this to you in the past, and that's what led to your misunderstanding.

But, if you do know the error (or lack thereof) involved at each step in the creation of the value (e.g. "all I've done is initialised it to zero"), then that's fine! No need to worry about it then.

Asteroids With Wings
  • 17,071
  • 2
  • 21
  • 35
  • My thinking prior to this was very closely aligned with your last statements about the sequence of calculations leading up to a specific value. Because this had been my experience with floating point numbers, I assumed they were all inherently inaccurate, but the base-2 representation makes a lot of sense. – crdrisko Apr 16 '20 at 21:41
  • @crdrisko Yep. Do note that even zero can be susceptible to this (I've seen examples, but can't think of one that'll survive optimisation off the top of my head). Your example "works" because it's just a basic initialisation with no steps of calculation. So it's about creeping rounding error from some values, not about the end result _per se_, and definitely not an intrinsic property of all floats. – Asteroids With Wings Apr 16 '20 at 21:42
  • @AsteroidsWithWings: `a + b - a - b` will give an inaccurate zero if `b` is much smaller than `a`. – Ben Voigt Apr 16 '20 at 22:15
  • @BenVoigt Yes but quickly blatting out a live example is tough because compilers are really smart, optimise the crap out of it, then use their built-in calculators instead of generating FP calculation instructions I wasn't willing to spend the 5+ minutes on that but I'll happily accept an edit to that effect if you are :) – Asteroids With Wings Apr 16 '20 at 22:23
  • @AsteroidsWithWings: Compilers are smart enough to know not to optimize floating-point in ways that change the result (provided you've used `/fp:strict` or `/fp:precise`... if you pass `/fp:fast` or `-ffast-math` then you've given permission to change results). The compile-time calculation can be defeated by using arguments. – Ben Voigt Apr 16 '20 at 22:24
  • @Ben _"Compilers are smart enough to know not to optimize floating-point in ways that change the result "_ That is absolutely not my experience. – Asteroids With Wings Apr 17 '20 at 10:01
2

Here is one way to look at the situation: with 64 bits to store a number, there are 2^64 bit patterns. Some of these are "not-a-number" representations, but most of the 2^64 patterns represent numbers. The number that is represented is represented exactly, with no error. This might seem strange after learning about floating point math; a caveat lurks ahead.

However, as huge as 2^64 is, there are infinitely many more real numbers. When a calculation produces a non-integer result, the odds are pretty good that the answer will not be a number represented by one of the 2^64 patterns. There are exceptions. For example, 1/2 is represented by one of the patterns. If you store 0.5 in a floating point variable, it will actually store 0.5. Let's try that for other single-digit denominators. (Note: I am writing fractions for their expressive power; I do not intend integer arithmetic.)

  • 1/1 – stored exactly
  • 1/2 – stored exactly
  • 1/3 – not stored exactly
  • 1/4 – stored exactly
  • 1/5 – not stored exactly
  • 1/6 – not stored exactly
  • 1/7 – not stored exactly
  • 1/8 – stored exactly
  • 1/9 – not stored exactly

So with these simple examples, over half are not stored exactly. When you get into more complicated calculations, any one piece of the calculation can throw you off the islands of exact representation. Do you see why the general rule of thumb is that floating point values are not exact? It is incredibly easy to fall into that realm. It is possible to avoid it, but don't count on it.

Some numbers can be represented exactly by a floating point value. Most cannot.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • You mentioned that storing 0.5 actually stores 0.5. Is this because 1/2 can be stored exactly in binary so the machine stores 1/2? Perhaps I read that wrong. – crdrisko Apr 16 '20 at 21:48
  • @crdrisko: Powers of two naturally fit into base-2 very well, even with the way IEEE floating-point has decided to encode fractional things (negative powers). 1/2 is 0.5 in decimal but 0.1 in binary. It fits. There's no "recurring" part to worry about. – Asteroids With Wings Apr 16 '20 at 21:49
  • _"If you store `0.5` in a floating point variable, it will actually store `0.5`"_ This is a little confusing/misleading; as if it stores decimal or, worse, a string. What it'll actually store is 00111111000000000000000000000000. Play with https://www.h-schmidt.net/FloatConverter/IEEE754.html. – Asteroids With Wings Apr 16 '20 at 21:50
  • @crdrisko I believe your understanding is correct. Any number that can be written as the sum of powers of 2 (with some limitations) can be stored exactly. So `1/2` is exact as it can be written as `2^(-1)`. Other examples of exact representations are `2^(-2)` and `2^(-1) + 2^(-2)` (a.k.a. `3/4`). Another fun example consists of the integers: sums of nonnegative powers of 2 :) ! – JaMiT Apr 16 '20 at 21:53
  • @AsteroidsWithWings: that’s a very helpful tool, I’m going to look into that more – crdrisko Apr 16 '20 at 21:55
  • @crdrisko Yeah I just found it; looks pretty awesome. Great visualisation of how floats are constructed, so you should be able to see more clearly how the negative powers (and the constraints of representing those in base-2) come into play here. – Asteroids With Wings Apr 16 '20 at 21:56
  • @JaMiT: your list of representations is extremely helpful. It definitely hits home the point that we should be considering this base-2 representation, not the base-10 as I believe I have been doing up to this point – crdrisko Apr 16 '20 at 21:57