-1

When converting 630948893921274879 of type float64 to an int, I would expect the result to be 630948893921274879. However, the actual result is 630948893921274880, which is 1 greater.

What is the reason for that?

import (
    "fmt"
)

func main() {
    var p float64 = 630948893921274879
    fmt.Println(int(p))   // 630948893921274880
    fmt.Printf("%f\n", p) // 630948893921274880.000000
}

https://play.golang.org/p/gbXKCkZ6_rF

Vaelin
  • 638
  • 9
  • 10
  • 2
    see also [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) among others. A float64 only has 53 bits of precision – JimB Jul 06 '21 at 17:55
  • Remember that in code there are no true numbers, only representations of numbers. The representations used in text and the representations used in the machine are not the always compatible, and neither are capable of describing every number. – Hymns For Disco Jul 06 '21 at 17:59

3 Answers3

6

630948893921274879 is larger than the largest integer float64 can encode without rounding. The way "floats" (i.e. floating point numbers) work is identical to scientific notation. They store a certain number of significant digits, and then they multiply it by some power of two. 630948893921274879 requires more significant digits than float64 can hold, so it gets rounded to the nearest value it can represent.

If you need to work with integers this large, you need to work in integers the whole time. You cannot convert to floating point values.

Rob Napier
  • 286,113
  • 34
  • 456
  • 610
5

float64 doesn't get to go "bigger" than int64 "for free". It trades off precision and range.

After a certain magnitude, integers are only representable to the closest 2. As you go even bigger, you eventually skip every 4, then every 8, and so on.

Alexander
  • 59,041
  • 12
  • 98
  • 151
4

Your float cannot be stored as 630948893921274879, but rather the closest analog, 630948893921274880. Taking the log of base 2, lg(630948893921274879) is between 59 and 60. As such, the spacing between numbers is 2^(59-52) = 2^7 = 128. That means any number between 2^59 and 2^60 will be rounded to the nearest multiple of 128.

Explanation: A float (any size, not just 64) is not one number, but 3 numbers multiplied together. For a 64 bit float: The first number is the sign, 1 bit long. The second number is the exponent, 11 bits long. The final number is the significand (AKA mantissa, fraction, and a few other names), 52 bits long. The final number is then -1^sign * 2^exponent * 1.significand. 1.significand will equal a number between 1 and (almost) 2. This means, every time you get to a multiple of 2, the exponent will increase and the significand will reset. Since the exponent increases, the accuracy of the numbers you get will decrease by half, as the most minor increase in significand will now be twice as large. Your number just happens to be at the point where the smallest change in the significand results in an increase of 128; because of this, your number will be rounded to the nearest multiple of 128. As such, it is not the conversion of float64 to int that is causing the issue, but float64 itself.