3

I have the following Rust code:

use std::f64::consts as f64;

fn main() {
    println!("Checking f64 PI...");
    // f64::PI definition: https://github.com/rust-lang/rust/blob/e1fc9ff4a794fb069d670dded1a66f05c86f3555/library/core/src/num/f64.rs#L240
    println!("Definition: pub const PI: f64 = 3.14159265358979323846264338327950288_f64;");
    println!("Print it:                       {:.35}", f64::PI);
    println!("Different after 16 significant digits ----------|                         ");
    println!("##############################################################################");
    println!("Question 1: Why do the digits differ after 16 significant digits when printed?");
    println!("##############################################################################");

    println!("PERFORM ASSERTIONS..."); 
    assert_eq!(f64::PI, 3.14159265358979323846264338327950288_f64); // 36 significant digits definition
    assert_eq!(f64::PI, 3.141592653589793_f64); // 16 significant digits (less then the 36 in definition)
    // compares up to here -------------|
    assert_eq!(f64::PI, 3.14159265358979300000000000000000000_f64); // 36 significant digits (16 used in equality comparison)
    assert_ne!(f64::PI, 3.14159265358979_f64); // 15 significant digits (not equal)

    println!("PERFORM EQUALITY CHECK..."); 
    if 3.14159265358979323846264338327950288_f64 == 3.14159265358979300000000000000000000_f64 {
        println!("BAD: floats considered equal even when they differ past 16 significant digits");
        println!("######################################################################");
        println!("Question 2: Why does equality checking use only 16 significant digits?");
        println!("They are defined using 36 significant digits so why can't we perform");
        println!("an equality check with this accuracy?");
        println!("######################################################################");
    } else {
        println!("GOOD: floats considered different when they differ past 16 significant digits");
        println!("NOTE: This block won't execute :(");
    }
}

I understand floating point arithmetic can be tricky but wanting to know if the trickiness also affects printing and performing equality checks on f64's. Here is the output from the above code:

Checking f64 PI...
Definition: pub const PI: f64 = 3.14159265358979323846264338327950288_f64;
Print it:                       3.14159265358979311599796346854418516
Different after 16 significant digits ----------|                         
##############################################################################
Question 1: Why do the digits differ after 16 significant digits when printed?
##############################################################################
PERFORM ASSERTIONS...
PERFORM EQUALITY CHECK...
BAD: floats considered equal even when they differ past 16 significant digits
######################################################################
Question 2: Why does equality checking use only 16 significant digits?
They are defined using 36 significant digits so why can't we perform
an equality check with this accuracy?
######################################################################

2 Answers2

5

An f64, as the name suggests, is stored in 64 bits. In this fixed amount of storage we can only encode a fixed amounts of digits (specifically, 52 of those bits are dedicated to the significand). If you use more digits in your floating-point literal the number stored in your f64 variable will be rounded to the nearest number that is representable in the amount of bits available. For f64 this means we can always exactly represent 15 decimal digits, sometimes 16. This explains why sometimes numbers appear equal even though you used different floating-point literals in your source code: it's because after rounding to the nearest representable number, they are the same.

The reason for different digits being printed is the same. The number is rounded to the nearest representable number when stored, and converted back to decimal again when printed. The additional digits originate from the binary-to-decimal conversion, but the digits after 15 or 16 decimal places are mostly meaningless – they don't carry any additional information about the number being represented.

Note that none of this is specific to Rust. Most modern programming languages use the IEEE 754-1985 standard to represent floating-point numbers, so they will behave identically. If you want arbitrary-precision arithmetic, you generally need to use some library, e.g. the rug crate.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    The term used in the IEEE 754 standard is “significand,” rather than “mantissa.” “Mantissa” is an old term for the fractional part of a logarithm, and a mantissa is logarithmic (adding to it multiplies the number represented), whereas a significand is linear (multiplying it multiplies the number represented). – Eric Postpischil Jan 14 '21 at 13:55
  • Re “For f64 this means we can always exactly represent 15 decimal digits, sometimes 16”: This is not an accurate or good way to think about how floating-point represents numbers. Binary floating-point does not represent decimal digits at all. Rather, it has sufficient precision that the result of converting any 15-digital decimal floating-point numeral to binary64 (using round-to-nearest-ties-to-even) and back to 15-digit decimal floating-point (same rounding) yields the original number. I know that is a mouthful, but, when people think about floating-point incorrectly, they make mistakes. – Eric Postpischil Jan 14 '21 at 14:06
  • @EricPostpischil It never made sense to me that it was called "mantissa", but everyone else seemed to be calling it thus, so I did as well. Thanks for clarifying this! – Sven Marnach Jan 14 '21 at 14:06
  • To see why it is inaccurate to say that f64 can sometimes represent 16 digits, consider 1.3333333333333332593184650249895639717578887939453125. It has 53 digits and is represented exactly in f64. Additional discussion is [here](https://stackoverflow.com/a/61614323/298225). – Eric Postpischil Jan 14 '21 at 14:07
  • 1
    After having spent years in academia writing numerical software, I do think that remembering that double-precision floating-point numbers can hold 15 to 16 decimal digits is a useful way to think about floating-point numbers. The fact that some specific other numbers can be stored to a higher precision is rather irrelevant in practice. – Sven Marnach Jan 14 '21 at 14:11
  • 1
    Sure, remembering it can be used to record at 15 decimal digits is a useful shorthand for estimating the precision. (Remembering “to 16” is not, as it is misleading, and I doubt you can give a formal definition in which it is true that 16 can be held sometimes for which it is not also true that 17 or 99 can be held sometimes by the same definition.) It is, however, not a good way to teach students because it misleads them about how floating-point works and does not give them the tools they need to design good floating-point algorithms, let alone prove them. – Eric Postpischil Jan 14 '21 at 14:17
  • @EricPostpischil I'm pretty sure I _can_ give a formal, reasonable definition that it's sometimes 16 decimal digits, without ever being 17 or 99, but I currently don't have the time to do so. :) – Sven Marnach Jan 14 '21 at 15:00
  • It is not hard to see why you will not. The density of floating-point numbers in the decimal numerals decreases as the number of decimal digits increases, but there is no sharp drop, and it is never zero. At any given exponent in range, there are 2^53 representable values, about 9e15. There are 10^16 16-digit numbers, so converting the binary numbers to decimal yields about 90% of the 16-digit numbers (this is coarse, due to the different places where powers of two and powers of ten transition). They yield about 9% of the 17-digit numbers, .9% of the 18, and so on, always having a few. – Eric Postpischil Jan 14 '21 at 15:16
  • @EricPostpischil It's possible to give an interval of significands in base 10 with the property that all numbers with significands in that interval can be represented with a precision of 16 digits (which is roughly possible unless the significand starts with a 9, but it's a bit more complicated than that). It's not possible to give an interval of significands with the property that all numbers in that interval can be represented to a precision of 17 digits. There are _isolated numbers_ that can be represented with a higher precision, but no full intervals. – Sven Marnach Jan 15 '21 at 10:44
  • Also note that there are only 9e15 16-digit numbers that don't start with a zero. In exponential notation, your numbers never start with a zero. – Sven Marnach Jan 15 '21 at 10:47
2

You are starting with the assumption that passing these all these double literals 3.14159265358979323846264338327950288_f64, 3.141592653589793_f64 or 3.14159265358979_f64 actually assigns these exact values to your variables. This assumption is incorrect.

Even though the authors of rust's source code used the first actual 36 digits of the mathematical constant to define f64::PI, the actual 64-bit value stored using the IEEE 754 floating point format is different. The closest IEEE 754 64-bit float value according to the online converter will be 0x400921FB54442D18, which can be approximated using the number 3.1415926535897931159979634685 when converted back to decimal. You get this same value when you convert the IEEE 754 value 0x400921FB54442D18 into a decimal number.

In other words:

What we wanted to store: 3.14159265358979323846264338327950288 
What is actually stored: 3.14159265358979311599796346854...

Perhaps a simpler way to visualize this would be to imagine having a fictional data-type which can store positive real numbers from 0 to 1, and is internally represented using a string (an array of chars), with a max length of 12 characters. So, you take this bizarre 96-bit type and create 5 variables:

strdouble A = 0.333333;       // internally stored as x = { .raw = "0.333333000" }
strdouble B = 0.333333333;    // internally stored as x = { .raw = "0.333333333" }
strdouble C = 0.333333333333; // internally stored as x = { .raw = "0.333333333" }
strdouble D = 0.333333333444; // internally stored as x = { .raw = "0.333333333" }
strdouble E = 0.333333333555; // internally stored as x = { .raw = "0.333333334" }

You can see that B, C and D will be equal, although the literal values passed to the compiler are quite different. You can also see how arithmetic like (1/3 + 1/3 + 1/3) would return 0.999999999 instead of 1, because there simply isn't a way to represent any precision beyond the last raw digit.

vgru
  • 49,838
  • 16
  • 120
  • 201