Why is there inaccuracy in the following intermediate type conversion?

Question

The following example is from the page 14 of the book Discovering Modern C++, Peter Gottschling. The author states:

To illustrate this conversion behavior, let us look at the following example:

long l = 1234567890123;
long l2 = l + 1.0f - 1.0; // imprecise
long l3 = l + (1.0f - 1.0); // precise

This leads on the author's platform to:

l2 = 1234567954431;
l3 = 1234567890123;

My question is is that exactly what causes this imprecision? Is it due to left-associativity of addition and subtraction, so that l2is calculated as (l + 1.0f) - 1.0? If so, surely the value range 3.4E +/- 38 (7 digits) of float (see) covers the value 1234567890123, so that to my knowledge narrowing shouldn't be an issue.

You almost got it, I think. Just note that `float` can not represent all values betwenn it's min and max value. It's just 32 bits, so it can represent 2^32 different values. Thus, many values (like 1234567890123 most likely) are not exactly representable. — Lukas-T, Jun 05 '21 at 12:05
@churill Ah, I see as `2^32 = 4294967296 < 1234567890123`. Could you enlighten me on what the `3.4E +/- 38 (7 digits)` means in the article *Data Types Ranges* of Microsoft? I'm confused by the usage of scientific notation, as `3.4E38 = 3.4 * 10^(38) > 2^32`. — Epsilon Away, Jun 05 '21 at 12:09
Note that the "precise" line is still not 100% precise - it has higher precision because it uses `double` instead of `float` but will still give wrong results for a larger value of `l`. — interjay, Jun 05 '21 at 12:13
Epsilon Away, Warning: C allows (and I am sure C++ inherits this) the implementation ability for intermediate FP calculations to perform with wider types. See `FLT_EVAL_METHOD`. Thus `l2, l3` could have the same answer. — chux - Reinstate Monica, Jun 05 '21 at 12:43
Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — Richard Critten, Jun 05 '21 at 13:39
@RichardCritten I think that HolyBlackCat's answer answers this question the best. Namely the issue is in the density of representable numbers with `float`. Nevertheless, thanks for the link! — Epsilon Away, Jun 05 '21 at 13:41

score 5 · Accepted Answer · answered Jun 05 '21 at 12:10

A float is typically 32 bits. How do you think it achieves greater range (max value ~3.4e38) compared to the same-sized int, for which the max value is ~2.1e9?

The only possible answer is that it can't store some of the integers on the way to the max value. And the gaps between representable numbers increase as the absolute value increases.

Consider this code:

#include <cmath>
#include <iostream>
#include <limits>

void foo(float x, int n)
{
    while (n-- > 0)
    {
        std::cout << x << "\n "[n > 0];
        x = std::nextafter(x, std::numeric_limits<float>::infinity());
    }
}

int main()
{
    std::cout.precision(1000);

    foo(0.001, 3);
    foo(1, 3);
    foo(100000000, 3);
}

It iterates over the float values as slow as possible, i.e. incrementing the value by the smallest possible amount.

0.001000000047497451305389404296875 0.00100000016391277313232421875 0.001000000280328094959259033203125
1 1.00000011920928955078125 1.0000002384185791015625
100000000 100000008 100000016

As you can see, near 100000000 it can only represent every 8th integer.

score 0 · Answer 2 · answered Jun 05 '21 at 12:15

in the first line

l2 = l + 1.0f

leads to an intermediate float being generated, and 1234567890123 won't fit in float without loss of precision. the 1 is then subtracted. The answer you have is correct

the 2nd line, don't write this. It seems to me to be undefined behavior. you have a float from what is in the bracket, being added to an int, which creates a float that is then being cast back to an int at the end of the evaluation. Unless I'm reading the standard wrong it should also give the imprecise value, though compiler optimizations are probably stopping this.

Unless you want to confuse yourself senseless, explicitly cast things when adding if things like this matter

If by "2nd line" you mean the calculation of `l3`, that uses `double` instead of `float` which is why it has a higher precision. I don't see why you think it's undefined behavior. — interjay, Jun 05 '21 at 12:22

Why is there inaccuracy in the following intermediate type conversion?

2 Answers2