1

I know that the number 159.95 cannot be precisely represented using float variables in C.

For example, considering the following piece of code:

#include <stdio.h>
int main()
{
    float x = 159.95;
    printf("%f\n",x);
    return 0;
}

It outputs 159.949997.

I would like to know if there is some way to know in advance which real value (in decimal system) would be represented in an imprecise way like the 159.95 number.

Best regards.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
Zaratruta
  • 2,097
  • 2
  • 20
  • 26
  • 1
    Basically all numbers that need more than `24` bits, ie... those that are not a multiple of `1/(2^24)` or `.000000059604644775390625` (might be `23` or `25`, I'm not sure; it is different for `float`, `double`, `long double`) ... and it changes with the range -- it's different for large and small numbers (`45234523674` or `0.000000000000000006756542`). Read [IEEE 754 Wikipedia article](https://en.wikipedia.org/wiki/IEEE_754) – pmg Feb 14 '22 at 21:21
  • Are you familiar with IEEE 754 standard? – Eugene Sh. Feb 14 '22 at 21:23
  • 2
    Also, in the absence of a **strong reason** otherwise, always use `double` for floating-point. – pmg Feb 14 '22 at 21:26
  • @EugeneSh. Yes. I know it. But I'm just starting to learn how to program and digital stuff. It is not obvious for me, after reading the standard, how to determine when a real number (in decimal system) would be imprecisely represented. Please, let me know what I have missed from the standard. – Zaratruta Feb 14 '22 at 21:27
  • 1
    @pmg They don't need to be "large". 1.000000059604644775390625 is already large enough to not be representable as a float. – nanofarad Feb 14 '22 at 21:27
  • 1
    @Zaratruta Naively, you could create a list of *all* floats possible (32 bits - a bit over 4 million entries) and look up in there. For `double`s won't be practical though. – Eugene Sh. Feb 14 '22 at 21:29
  • 5
    A simple answer would be to assume they are all imprecise, and work accordingly, especially with comparison tests. And in the example 159.95 posted, if you need 9 significant digits then `float` is inadequate. – Weather Vane Feb 14 '22 at 21:33
  • @WeatherVane. Yes. When I develop programs, I assume that. However, I would like to understand the phenomenon in a deeper way. – Zaratruta Feb 14 '22 at 21:37
  • 3
    Put it this way: of the *infinity* of floating point values there are, only about 2^32 can be exactly represented. – Weather Vane Feb 14 '22 at 21:39
  • use printf("%.2f\n", x); for float and printf("%.2lf\n", x); for double. You will get output 159.95 – Ihdina Feb 14 '22 at 22:41
  • There are several more answers to this question — I suppose I should vote to close as a duplicate — at [How to determine w/o conversions that a given floating constant can be represented?](https://stackoverflow.com/questions/68835588) – Steve Summit Feb 14 '22 at 22:42
  • 1
    Given that there an infinite number or _real values_ that cannot be represented using a _finite_ 32 bit representation, is there really any benefit in picking out the insignificant number of values that can be exactly represented? This is true of any representation - most _real_ values are also not precisely represented in decimal either - no matter how many digits you use. Real values are continuous; any number system representation with a finite number of digits is discrete. – Clifford Feb 14 '22 at 22:45
  • @SteveSummit. In my point of view, it is not the same question. My question here is how can we determine in advance which number can be precisely represented. – Zaratruta Feb 14 '22 at 23:07
  • @Clifford. The benefit is understanding. And everithing that can follow from this. – Zaratruta Feb 14 '22 at 23:08
  • 1
    @Zaratruta Floating point numbers are fascinating, and rewarding to understand. Two other questions I've found particularly instructive are [bitwise splitting the mantissa of a IEEE 754 double?](https://stackoverflow.com/questions/69207580) and [how can smallest floating point number be 2^(-126) , not 2^(-128)](https://stackoverflow.com/questions/43153699). – Steve Summit Feb 14 '22 at 23:13
  • 1
    Yes, that is my point. If you understand the maths, you realise the futility of this exercise. _real_ numbers and _decimal_ representation of numbers are not the same thing. You seem to be equating them. The coincidence of an exact decimal with a particular binary representation is not particularly useful. What is useful to understand is the number of _decimal significant figures_ are preserved without change in a round-trip conversion from decimal to binary and back. For `float` it is 6. – Clifford Feb 15 '22 at 07:56
  • See also https://stackoverflow.com/questions/3310276/decimal-precision-of-floats – Clifford Feb 15 '22 at 08:03
  • 1
    Yet more good discussion at [Why can't decimal numbers be represented exactly in binary?](https://stackoverflow.com/questions/1089018) – Steve Summit Feb 25 '22 at 19:31

5 Answers5

4

Succinctly, for the format most commonly used for float, a number is exactly representable if and only if it is representable as an integer F times a power of two, 2E such that:

  • the magnitude of F is less than 224, and
  • –149 ≤ E < 105.

More generally, C 2018 5.2.4.2.2 specifies the characteristics of floating-point types. A floating-point number is represented as sbe•sum(fk bk, 1 ≤ kp), where:

  • s is a sign, +1 or −1,
  • b is a fixed base chosen by the C implementation, often 2,
  • e is an exponent, which is an integer between a minimum emin and a maximum emax, chosen by the C implementation,
  • p is the precision, the number of base-b digits in the significand, and
  • fk are digits in base-b, nonnegative integers less than b.

The significand is the fraction portion of the representation, sum(fk bk, 1 ≤ kp). It is written as a sum so that we can express the variable number of digits it may have. (p is a variable set by the C implementation, not by the programmer using the C implementation.) When we write it out a significand in base b, it can be a numeral, such as .0011101010011001010101102 for a 24-bit significand in base 2. Note that, in the this form (and the sum), the significand has all its digits after the radix point.

To make it slightly easier to tell if a number is in this format, we can adjust the scale so the significand is an integer instead of having digits after the radix point: sbep•sum(fk bpk, 1 ≤ kp). This changes the above significand from .0011101010011001010101102 to 0011101010011001010101102. Since it has p digits, it is always a non-negative integer less than bp.

Now we can figure out if a finite number is representable in this format:

  • Get b, p, emin, and emax for the target C implementation. If it uses IEEE-754 binary32 for float, then b is 2, p is 24, emin is −125, and emax is 128. When <float.h> is included, these are defined as FLT_RADIX, FLT_MANT_DIGITS, FLT_MIN_EXP, and FLT_MAX_EXP.
  • Ignore the sign. Write the absolute value of number as a rational number n/d in simplest form. If it is an integer, let d be 1.
  • If d is not a power of b, the number is not representable in the format.
  • If n is a multiple of b greater than or equal to bp, divide it by b and multiply d by d until n is not a multiple or is less than bp.
  • If n is greater than or equal to bp, the number is not representable in the format.
  • Let e be such that 1/d = bep. If emineemax, the number is representable in the format. Otherwise, it is not.

Some floating-point formats might not support subnormal numbers, in which f1 is zero. This is indicated by FLT_HAS_SUBNORM being defined to be zero and would require modifications to the above.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    Excellent write-up. For IEEE-754 `binary32` I verified that the stated bounds for `E` are tight. – njuffa Feb 14 '22 at 22:37
  • 1
    @njuffa: Thanks. There are three or more opportunities for off-by-one errors in those, so I was worried about them. (IEEE-754 exponent bounds are 1−127 = −126 and 254−127 = 127, C expresses them with the significand scaled down by 2, and then there is the scaling to make the significand an integer.) – Eric Postpischil Feb 14 '22 at 22:49
  • @njuffa: Of course, I had the exponent sense of d negated in d = b^(e-p), since it is the denominator. I corrected that to 1/d = b^(e-p). Plenty of opportunities for errors, sigh. – Eric Postpischil Feb 15 '22 at 10:24
3

I would like to know if there is some way to know in advance which real value (in decimal system) would be represented in an imprecise way like the 159.95 number.

In general, floating point numbers can only represent numbers whose denominator is a power of 2.

To check if a number can be represented as floating point value (of any floating-point type) at all, take the decimal digits after the decimal point, interpret them as number and check if they can be divided by 5^n while n is the number of digits:

159.95 => 95, 2 digits => 95%(5*5) = 20 => Cannot be represented as floating-point value

Counterexample:

159.625 => 625, 3 digits => 625%(5*5*5) = 0 => Can be represented as floating-point value

You also have to consider the fact that floating-point values only have a limited number of digits after the decimal point:

In principle, 123456789 can be represented by a floating-point value exactly (it is an integer), however float does not have enough bits!

To check if an integer value can be represented by float exactly, divide the number by 2 until the result is odd. If the result is < 2^24, the number can be represented by float exactly.

In the case of a rational number, first do the "divisible by 5^n" check described above. Then multiply the number by 2 until the result is an integer. Check if it is < 2^24.

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
2

Usually, a float is an IEEE754 binary32 float (this is not guaranteed by spec and may be different on some compilers/systems). This data type specifies a 24-bit significand; this means that if you write the number in binary, it should require no more than 24 bits excluding trailing zeros.

159.95's binary representation is 10011111.11110011001100110011... with repeating 0011 forever, so it requires an infinite number of bits to represent precisely with a binary format.

Other examples:

1073741760 has a binary representation of 111111111111111111111111000000. It has 30 bits in that representation, but only 24 significant bits (since the remainder are trailing zero bits). It has an exact float representation.

1073741761 has a binary representation of 111111111111111111111111000001. It has 30 significant bits and cannot be represented exactly as a float.

0.000000059604644775390625 has a binary representation of 0.000000000000000000000001. It has one significant bit and can be represented exactly.

0.750000059604644775390625 has a binary representation of 0.110000000000000000000001, which is 24 significant bits. It can be represented exactly as a float.

1.000000059604644775390625 has a binary representation of 1.000000000000000000000001, which is 25 significant bits. It cannot be represented exactly as a float.

Another factor (which applies to very large and very small numbers) is that the exponent is limited to the -126 to +127 range. With some handwaving around denormal values and other special cases, this generally allows values ranging from roughly 2-126 to slightly under 2128.

nanofarad
  • 40,330
  • 4
  • 86
  • 117
2

I would like to know if there is some way to know in advance which real value... would be represented in an imprecise way

The short and only partly facetious answer is... all of them!

There are roughly 2^32 = 4294967296 values of type float. And there are an uncountably infinite number of real numbers. So, for a randomly-chosen real number, the chance that it can be exactly represented as a value of type float is 4294967296/∞, which is 0.

If you use type double, there are approximately 2^64 = 18446744073709551616 of those, so the chance that a randomly-chosen real number can be exactly represented as a double is 18446744073709551616/∞, which is again... 0.

I realize I'm not answering quite the question you asked, but in general, it's usually a bad idea to use binary floating-point types as if they were an exact representation of decimal fractions. Attempts to assume that they're ever an exact representation usually lead to trouble. In general, it's best to assume that floating-point types are an imperfect (approximate) realization of of real numbers, period (that is, without assuming decimal). If you never assume they're exact (which for true real numbers, they virtually never are), you'll never get into trouble in cases where you thought they'd be exact, but they weren't.

[Footnote 1: As Eric P. reminds in a comment, there's no such thing as a "randomly-chosen real number", which is why this is a partially facetious answer.]

[Footnote 2: I now see your comment where you say that you do assume they are all imprecise, but that you would "like to understand the phenomenon in a deeper way", in which case my answer does you no good, but hopefully some of the others do. I can especially commend Martin Rosenau's answer, which goes straight to the heart of the matter: a rational number is representable exactly in base 2 if and only if its reduced denominator is a pure power of 2, or stated another way, has only 2's in its prime factorization. That's why, if you take any number you can actually store in a float or double, and print it back out using %f and enough digits, with a properly-written printf, you'll notice that the numbers always end in things like ...625 or ...375. Binary fractions are like the English rulers still used in the U.S.: everything is halves and quarters and eights and sixteenths and thirty-seconds and sixty-fourths.]

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • The same can be said for integers. – stark Feb 14 '22 at 21:59
  • @stark Unless we are picking from a specific range. – Eugene Sh. Feb 14 '22 at 22:05
  • 2
    Re “for a randomly-chosen real number”: No uniform distribution can be defined on the real numbers (nor even on the integers), so some other distribution must be specified. In infinitely many such distributions, the probability a chosen number is representable is non-zero. – Eric Postpischil Feb 14 '22 at 22:12
  • @EricPostpischil I meant to concede that this was a partially-facetious answer. Thanks for the reminder (and the reasoning). – Steve Summit Feb 14 '22 at 22:21
  • There are uncountably infinite different real numbers, but there are only a countable infinite subset of them that we can represent thru an algorithm. Most of them, we'll never ever be able to represent in a computer in whatever format. We just can't compute them! Of course, if we're able to give a decimal representation of the real, it falls in the countable set. – aka.nice Feb 16 '22 at 11:03
2

I would like to know if there is some way to know in advance which real value (in decimal system) would be represented in an imprecise way like the 159.95 number.

In another answer I semiseriously answered "all of them", but let's look at it another way. Specifically, let's look at which numbers can be exactly represented.

The key fact to remember is that floating point formats use binary. (The major, popular formats, anyway.) So the numbers that can be represented exactly are the ones with exact binary representations.

Here is a table of a few of the single-precision float values that can be represented exactly, specifically the seven contiguous values near 1.0. I'm going to show them as hexadecimal fractions, binary fractions, and decimal fractions. (That is, along each horizontal row, all three values are exactly the same, just represented in different bases. But note that the fractional hexadecimal and binary representations I'm using here are not directly acceptable in C.)

hexadecimal binary decimal delta
0x0.fffffd 0b0.111111111111111111111101 0.999999821186065673828125 5.96e-08
0x0.fffffe 0b0.111111111111111111111110 0.999999880790710449218750 5.96e-08
0x0.ffffff 0b0.111111111111111111111111 0.999999940395355224609375 5.96e-08
0x1.000000 0b1.00000000000000000000000 1.000000000000000000000000
0x1.000002 0b1.00000000000000000000001 1.000000119209289550781250 1.19e-07
0x1.000004 0b1.00000000000000000000010 1.000000238418579101562500 1.19e-07
0x1.000006 0b1.00000000000000000000011 1.000000357627868652343750 1.19e-07

There are several things to notice about this table:

  • The decimal numbers look pretty weird.
  • The hexadecimal and binary numbers look pretty normal, and show pretty clearly that single-precision floating point has 24 bits of precision.
  • If you look at the decimal column, the precision seems to be about equivalent to 7 decimal digits.
  • It's clearly not exactly 7 digits, though.
  • The difference between consecutive values less than 1.0 is about 0.00000005, and greater than 1.0 is twice that, about 0.00000010. (More on this later.)

Here is a similar table for type double. (I'm showing fewer columns because there's not enough room horizontally for everything.)

hexadecimal decimal delta
0x0.ffffffffffffe8 0.99999999999999966693309261245303787291049957275390625 1.11e-16
0x0.fffffffffffff0 0.99999999999999977795539507496869191527366638183593750 1.11e-16
0x0.fffffffffffff8 0.99999999999999988897769753748434595763683319091796875 1.11e-16
0x1.0000000000000 1.0000000000000000000000000000000000000000000000000000
0x1.0000000000001 1.0000000000000002220446049250313080847263336181640625 2.22e-16
0x1.0000000000002 1.0000000000000004440892098500626161694526672363281250 2.22e-16
0x1.0000000000003 1.0000000000000006661338147750939242541790008544921875 2.22e-16

You can see right away that type double has much better precision: 53 bits, or about 15 decimal digits' worth instead of 7, and with a much finer spacing between "adjacent" numbers.

What does it mean for these numbers to be "contiguous" or "adjacent"? Aren't real numbers continuous? Yes, true real numbers are continuous, but we're not looking at true real numbers: we're looking at finite-precision floating point, and we are, literally, seeing the finite limit of the precision here. In type float, there simply is no value — no representable value, that is — between 1.00000000 and 1.00000012. In type double, there is no value between 1.00000000000000000 and 1.00000000000000022.

So let's go back to your question, asking whether there's "some way to know which decimal values are represented in a precise or imprecise way."

If you look at ten decimal values between 1 and 2:

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9

the answer is, only one of them is exactly representable in binary: 1.5.

If you break the interval down into 100 fractions, like this:

1.01
1.02
1.03
1.04
1.05
…
1.95
1.96
1.97
1.98
1.99

it turns out there are three fractions you can represent exactly: .25, .50, and .75, corresponding to ¼, ½, and ¾.

If we looked at three-digit decimal fractions, there are at most seven of them we can represent: .125, .250, .375, .500, .625, .750, and .875. These correspond to eighths, that is, ordinary fractions with 8 in the denominator.

I said "at most seven" because it's not true (none of these estimates are true) for all ranges of numbers. Remember, precision is finite, and digits to the left of the decimal part — that is, in the integral part of your numbers — count against your precision budget, too. So it turns out that if you were to look at the range, say, 4000000–4000001, and tried to subdivide it, you would find that you could represent 4000000.25 and 4000000.50 as type float, but not 4000000.125 or 4000000.375. You can't really see it if you look at the decimal representation, but what's happening inside is that type float has exactly 24 binary bits of available precision, and the integer part 4000000 uses up 22 of those bits, so you've only got two bits left over for the fractional part, and with two bits you can do halves and quarters, but not eighths.

You're probably noticing a pattern by now: the fractions we've looked at so far that can be be represented exactly in binary involve halves, quarters, and eights, and if we looked further, this pattern would continue: sixteenths, thirty-seconds, sixty-fourths, etc. And this should come as no real surprise: just as in decimal the "exact" fractions involve tenths, hundredths, thousandths, etc.; when we move to binary (base 2) the fractions all involve powers of two. ½ in binary is 0b0.1. ¼ and ¾ are 0b0.01 and 0b0.11. ⅜ and ⅝ are 0b0.011 and 0b0.101.

What about a fraction like 1/3? You can't represent it exactly in binary, but since you can't represent it in decimal, either, this doesn't tend to bother us too much. In decimal it's the infinitely repeating fraction 0.333333…, and in binary it's the infinitely-repeating fraction 0b0.0101010101….

But then we come to the humble fraction 1/10, or one tenth. This obviously can be represented as a decimal fraction — 0.1 — but it turns out that it cannot be represented exactly in binary. In binary it's the infinitely-repeating fraction 0b0.0001100110011…. And this is why, as we saw above, you can't represent most of the other "single digit" decimal fractions 0.2, 0.3, 0.4, …, either (with the notable exception of 0.5), and you can't represent most of the double-digit decimal fractions 0.01, 0.02, 0.03, …, or most of the triple-digit decimal fractions, etc.

So returning once more to your question of which decimal fractions can be represented exactly, we can say:

  • For single-digit fractions 0.1, 0.2, 0.3, …, we can exactly represent .5, and to be charitable we can say that we can also represent .0, so that's two out of ten, or 20%.
  • For double-digit fractions 0.01, 0.02, 0.03, …, we can exactly represent .00, 0.25, 0.50, and 0.75, so that's four out of a hundred, or 4%.
  • For three-digit fractions 0.001, 0.002, 0.003, …, we can exactly represent the eight fractions involving eighths, so that's 8/1000 = 0.8%.

So while there are some decimal fractions we can represent exactly, there aren't very many, and the percentage seems to be going down as we add more digits. :-(

The fact — and depending on your point of view it's either an unfortunate fact or a sad fact or a perfectly normal fact — is that most decimal fractions can not be represented exactly in binary and so can not be represented exactly using computer floating point. The numbers that can be represented exactly using computer floating point, although they can all be exactly converted into numerically equivalent decimal fractions, end up converting to rather weird-looking numbers for the most part, with lots of digits, as we saw above. (In fact, for type float, which internally has 24 bits of significance, the exact decimal conversions end up having up to 24 decimal digits. And the fractions always end in 5.)

One last point concerns the spacing between these "contiguous", exactly-representable binary fractions. In the examples I've shown, why is there tighter spacing for numbers less than 1.0 than for numbers greater than 1.0?

The answer lies in an earlier statement that "precision is finite, and digits to the left of the decimal part count against your precision budget, too". Switching to decimal fractions for a moment, if I told you you had exactly 7 significant decimal digits to work with, you could represent

1234.567
1234.568
1234.569

and

12345.67
12345.68
12345.69

but you could not represent

12345.678

because that would require 8 significant digits.

Stated another way, for numbers between 1000 and 10000 you can have three more digits after the decimal point, but for numbers from 10000 to 100000 you can only have two. Mathematicians call these intervals like 1000-10000 and 10000-100000 decades, and within each decade, all the numbers have the same number of fractional digits for a given precision, and the same exponents: 1.000000×103 – 1.999999×103, 1.000000×104 – 1.999999×104, etc. (This usage is rather different than ordinary usage, in which the word "decade" refers to a period of 10 years.)

But for binary floating point, once again, the intervals of interest involve powers of 2, not 10. (In binary, some computer scientists call these intervals binades, by analogy with "decades".) The interesting intervals are from 1 to 2, 2–4, 4–8, 8–16, etc. For numbers between 1 and 2, you've got 1 bit to the left of the decimal point (really the "binary point"), so in single precision you've got 23 bits left over to use for the fractional part to the right. But for numbers between 2 and 4, you've got 2 bits to the left, so you've only got 22 bits to use for the fraction.

This works in the other direction, too: for numbers between ½ and 1, you don't need any bits to the left of the binary point, so you can use all 24 for the fraction to the right. (Below ½ it gets even more interesting). So that's why we saw twice the precision (numbers half the size in the "delta" column) for numbers just below 1.0 than for numbers just above. We'd see similar shifts in available precision when crossing all the other powers of two: 2.0, 4.0, 8.0, …, and also ½, ¼, ⅛, etc.


This has been a rather long answer, longer than I had intended. Thanks for reading. Hopefully now you have a better appreciation for which numbers can be exactly represented in binary floating point, and why most of them can't.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103