4

I am aware of the property of binary floating points where computers will not be able to calculate them to their rounded figures. I was wondering if there was any "logic" to knowing which floats will be rounded and which will not?

For example, when I run 0.1 + 0.2 in my console it returns 0.30000000000000004. Yet when I run 0.1 + 0.3 it correctly returns 0.4.

Is there any logic that determines which particular floats will not be rounded 'correctly'?

deejay123
  • 89
  • 7
  • 1
    You may think it returned 0.4, but that is impossible. It may have returned 0.40000000000000002220446049250313080847263336181640625, and then rounded it to 0.4 on output. The possible values of a finite binary float number are a subset of the terminating binary fractions, numbers that can be expressed as A/2^B (^ for exponentiation) for a pair of integers A and B. 0.4 is not one of them. – Patricia Shanahan Sep 01 '19 at 22:28
  • Figuring it out may involve examining `(0.1).toString(2)` and `(0.2).toString(2)`, etc, and doing some operations on them – CertainPerformance Sep 01 '19 at 22:35
  • @PatriciaShanahan Except that `0.1 + 0.3 === 0.4` evaluates to `true`. Somehow, `0.1 + 0.3`'s result, once represented and rounded, does not have any trailing digits, unlike `0.1 + 0.2` – CertainPerformance Sep 01 '19 at 22:37
  • 1
    this about javascript binary representation that respect IEEE 754 – Mister Jojo Sep 01 '19 at 22:44
  • Possible duplicate of [How to deal with floating point number precision in JavaScript?](https://stackoverflow.com/questions/1458633/how-to-deal-with-floating-point-number-precision-in-javascript) – Heretic Monkey Sep 01 '19 at 23:46
  • 1
    @HereticMonkey I do not think this is a duplicate. The other question asks (and has answers on) *how* to deal with it, but this is asking how to know *when* the inaccuracy will occur. Knowing one can help solve the other, but they're not the same thing – CertainPerformance Sep 02 '19 at 00:34
  • @CertainPerformance: The fact that `.1 + .3 == .4` evaluates to true does not prove `.1 + .3` does not have any trailing digits after the 4. It proves it evaluates to the same value as `.4`. And the value of `.4` is not .4. The source text `.4` is converted to floating-point, which produces a value near but different from .4. – Eric Postpischil Sep 02 '19 at 02:25
  • @EricPostpischil Yep, that's why I said *once represented and rounded* – CertainPerformance Sep 02 '19 at 02:27
  • @CertainPerformance: No, when “represented” and rounded, the final result of `.1 + .3` **does** have non-zero digits after “.4”. – Eric Postpischil Sep 02 '19 at 02:29

3 Answers3

2

Floating point rounding is basically down to mathematics. It is part of number theory.

I'll first explain it a bit in decimal and then show how it works in binary:

A number like 0.12 is basically "zero + 1 times 1/10 + 2 times 1/10^2", or 12/100. This is a so called "rational" number, a number that can be written as a ration between two integer numbers (1/10 = 0.12, 1/4 = 0.25, 1/2 = 0.5, are all rational numbers). Any non rational number cannot be written as a fraction in decimal (or any numbering system), non rational numbers are like "pi" "e" or square root of 2.

Now can any rational number be written as a terminating fraction?

We also know this isn't the case in decimal: 1/3 cannot be, nor can 1/7. But some can, it turns out there is logic behind this: Any rational number where the prime factors of the denominator are the same as the prime factors of the base in which the number will be written can be written as a finite floating point. The prime factors of 10 are 2 & 5. So any rational number whose prime factors are only 2 & 5 can be written as a full number in base 10 - or in other words any number that follows x/(2^p * 5^q) (or any summation of those numbers):

3/8 = 3/(2^3) = 0.375
1/80 = 1/(2^4 * 5^1) = 0.0125

but not:

1/65 = 1/(5^1 * 13^1) = 0.0153846153846...

Now back to floating point on a computer: the floating point unit works in binary, which is a base 2 system. The prime factors of that system are simple "2".

so any number that can be written as x/(2^a) can be written in a floating point unit without losing accuracy, and any number that is not of that form cannot be written without losing accuracy.

There is however one caveat: the floating point unit also has a limited size for accuracy, this limit the range of numbers further. IEEE 754-2008 notices that double precision numbers have a maximum accuracy "mantissa" of 52 bits, since binary numbers have only a single prime factor anyways, this limits above formula with a <= 52.

paul23
  • 8,799
  • 12
  • 66
  • 149
  • Did you mean `x/(2^p * 5^q)`? – Patricia Shanahan Sep 01 '19 at 23:27
  • Re “Now can any rational number be written as a fraction?” Yes. That is the definition of a rational number. You mean to ask whether a rational number can be written as a terminating numeral in a particular base. – Eric Postpischil Sep 02 '19 at 02:44
  • Re “There is however one caveat”: There are several caveats. *x* should be an integer, and it should be less than 2^53 in magnitude. There are also limits on the range of *a*. Unfortunately, you have gone about this presentation in a roundabout way. It could give a better basis for explanation to be direct: A finite number can be represented in the IEEE 754 64-bit binary format if and only if it equals M•2^e for some integers M and e such that -2^53 < M < 2^53 and -1074 ≤ e ≤ 971. That goes to the heart of the floating-point representation: It is a significand scaled by an exponent. – Eric Postpischil Sep 02 '19 at 02:49
  • Incidentally, “significand” is the preferred term for the fraction portion of a floating-point representation. “Mantissa” is an old term for the fraction portion of a logarithm. Significands are linear. Mantissas are logarithmic. The significand in JavaScripts number format is 53 bits. 52 are explicitly encoded in the significand field, but 1 is encoded via the exponent. – Eric Postpischil Sep 02 '19 at 02:50
  • The floating-point unit of the computer is not relevant because this question asks about JavaScript, and the behavior of JavaScript is specified by the ECMA-262 standard to use binary floating-point, regardless of the computer it is implemented on. – Eric Postpischil Sep 02 '19 at 02:51
  • @EricPostpischil indeed the limits are specified by the range of the number, and I could've delved also in that part of inaccuracy, but from the OP it seems that he didn't like to hear about that? (How if you have 2 digits to display data you can't display `103` without rounding). - As for `x` should be an integer, well that's implied in being a rational number, rational numbers are always a ratio between two integers. – paul23 Sep 02 '19 at 07:23
  • @EricPostpischil I'd also have to go to a "round about way", since that one liner only works for systems where floating point numbers are represented by a numeric system with base a prime factor. To actually understand, and say use it in base 6 or base 30 you'd can't use that simple formula, so just showing such a formula won't help understanding. – paul23 Sep 02 '19 at 07:26
  • No, that “one liner” works for any floating-point base. For any floating-point base b, the representable values are M•b^e for some integers M and e with restrictions depending on the format. That is because that formula comes directly from the definition of a floating-point format. (Some definitions treat the fraction portion as a fixed number of digits in base b including a radix point, but adjusting the scale so it is an integer M is mathematically equivalent and easier for number theoretic analysis.) It certainly would help understanding to know the definition! – Eric Postpischil Sep 02 '19 at 11:34
  • @EricPostpischil fair enough, but I fail to see how from that formula one can see if number can be represented exact (in finite number of digits). You'd have to check if there exist an integer `M` and `e` such that $M \cdot b^e$ equals the number you wish to represent. – paul23 Sep 02 '19 at 12:08
2

paul23's answer deals with the general principles. This answer analyzes the specific cases in the question.

For each string representing a decimal number, round-to-nearest will result in a specific 64-bit binary IEEE754 number. Here are the mappings for the numbers in the question:

0.1 0.1000000000000000055511151231257827021181583404541015625
0.2 0.200000000000000011102230246251565404236316680908203125
0.3 0.299999999999999988897769753748434595763683319091796875
0.30000000000000004 0.3000000000000000444089209850062616169452667236328125
0.4 0.40000000000000002220446049250313080847263336181640625

On conversion to floating point, both 0.1 and 0.2 rounded up, so their sum will be greater than 0.3. On the other hand, 0.3 rounded down, so the sum is greater than the closest floating point to 0.3. The rounding error in either direction is 2.77555756156289135105907917022705078125E-17, but the round-to-even rule results in rounding up.

When 0.1 and 0.3 were added, the rounding errors on the inputs were in opposite directions. The exact sum was 0.3999999999999999944488848768742172978818416595458984375, which is exactly half way between representable numbers 0.399999999999999966693309261245303787291049957275390625 and 0.40000000000000002220446049250313080847263336181640625. The rounding error is 2.77555756156289135105907917022705078125E-17 either way.

The hex representation of the bit pattern for the larger is 3fd999999999999a, which is even, so that is the way the rounding goes. As it happens, that is also the closest float to 0.4.

Unless you confine yourself to arithmetic on numbers that can all be exactly represented in 64-bit binary floating point it is very hard to predict which calculations will get the float closest to the intended decimal calculation and which will not. If this matters, you are either printing your output with too many decimal places or you need a different data type.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75
2

Which Numbers Will or Will Not Be Rounded

A finite number can be represented in the common IEEE-754 double-precision format if and only if it equals M•2e for some integers M and e such that -253 < M < 253 and -1074 ≤ e ≤ 971.

Every other finite number converted from decimal or resulting from another operation will be rounded.

(This is the format JavaScript uses because it conforms to ECMA-262, which says that the IEEE-754 64-bit binary floating-point format is used. The significand, M in the above, is often expressed as a value between 1 and 2 with a certain number of bits after a radix point, but I scaled it to an integer for easier analysis, and the exponent bounds are adjusted to match.)

All Numbers in the Question Are Rounded

This means all of the numbers in your example will be rounded:

  • There is no way to scale 0.1 by a power of 2 to make an integer for M. As we multiply 0.1 by 2 repeatedly, we get 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4, and we can see the fraction part forever repeats .2, .4, .8, .6,… So it never reaches .0. Since 0.1 cannot be represented as M•2e, it must be rounded.
  • Similarly, 0.2, 0.3, and 0.4 also cannot be scaled by any power of 2 to make an integer for M.
  • When these numbers 0.1, 0.2, 0.3, and 0.4 are converted to JavaScript’s Number format, the results are:
    • 0.1000000000000000055511151231257827021181583404541015625.
    • 0.200000000000000011102230246251565404236316680908203125.
    • 0.299999999999999988897769753748434595763683319091796875.
    • 0.40000000000000002220446049250313080847263336181640625.
  • Considering the mathematics a bit more formally, 0.1 is 1/10. It can never equal M•2e because then we would have M•2e = 1/10, so 2•5•M•2e = 1. Since M is an integer, 2•5•M is an integer, so 2e must cancel out the 5. But even for negative e, no power of 2 can cancel a prime factor other than 2.

In contrast the numbers 0.25 or 0.375 are representable. When we multiply 0.25 by 2, we get 0.5 and then 1, so 0.25 = 1•2−2, which matches the format above. And 0.375 produces 0.75, 1.5, and then 3, so 0.375 = 3•2−3, which also matches the format.

Why It Appears Some Numbers Are Not Rounded

Two confounding issues create the illusion that some operations are exact:

  1. JavaScript’s default display of a value uses just enough decimal digits to uniquely distinguish the Number value. This comes from step 5 in clause 7.1.12.1 of the ECMAScript 2017 Language Specification..
    • Thus, for 0.1000000000000000055511151231257827021181583404541015625, for example, JavaScript displays it as “0.1” because that is enough—converting “0.1” to floating-point results in that same value, so there is no need for more digits.
    • This hides the rounding because for any decimal numeral up to 15 significant digits, converting it to Number and then displaying it produces the same number. For example, we have 0.12345 → 0.123450000000000004174438572590588591992855072021484375 → “0.12345”. The default formatting rule causes any numeral up to 15 digits to be the one produced by displaying the Number value that results from that numeral.
  2. Sometimes when evaluating a + b == c for decimal numerals a, b, and c, the rounding of a + b happens to coincide with the rounding that occurs for c. Sometimes it does not.
    • In 0.1 + 0.3 == 0.4, 0.1000000000000000055511151231257827021181583404541015625 and 0.299999999999999988897769753748434595763683319091796875 are added, and the rounded result is 0.40000000000000002220446049250313080847263336181640625. That is the same as the result of 0.4, so the evaluation reports true even though there were rounding errors.
    • In 0.1 + 0.2 == 0.3, 0.1000000000000000055511151231257827021181583404541015625 and 0.200000000000000011102230246251565404236316680908203125 are added, and the rounded result is 0.3000000000000000444089209850062616169452667236328125. That differs from the result for .3, which is 0.299999999999999988897769753748434595763683319091796875. So the evaluation reports false.

The latter result shows us why displaying the result of 0.1 + 0.2 produces “0.30000000000000004”. It is close to 0.3, but 0.299999999999999988897769753748434595763683319091796875 is closer, so, to uniquely distinguish 0.3000000000000000444089209850062616169452667236328125 from that closer value, JavaScript has to use more digits—it produces zeros until it gets to the first non-zero digit, resulting in “ 0.30000000000000004”.

We could ask when will a + b == c evaluate to true? The mathematics absolutely determines this; a, b, and c are each converted to the nearest representable value, the addition is performed and its result is rounded to the nearest representable value, and then the expression is true if the left and right results are equal. But there is no simple pattern for this. It depends on the patterns the decimal numerals form in binary. You can find various patterns here and there. But, by and large, they are effectively random.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312