2

IEEE-754 64 bit double precision floating point can't represent the integer 9007199254740995 and it stores instead 9007199254740996. Why not 9007199254740994?

I'm wondering what rules define this process? Does the integer get rounded? Are the rounding rules defined for rounding numbers with fractional parts also applied for integers?

I don't think that JS simply discards the bits that can't be put into mantissa. The number 9007199254740995 in binary is represented as:

1 0000000000000000000000000000000000000000000000000001 1
  |                       52 bits                    |  
  ----------------------------------------------------

The first implicit bit is not stored. So if JS simply discarded the bits that can't be stored, we would have 51 zeros followed by 1, which would result in the number 9007199254740994. But instead we have 50 zeros followed by 10:

1 0000000000000000000000000000000000000000000000000010 0
  |                       52 bits                    |  
  ----------------------------------------------------

which is the number 9007199254740996. So some other transformations had to take place.

Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
  • because it has not enough space to store the last bit. Numbers have 53bit of precision. It would take 54bit to properly represent `9007199254740993`. Without this last bit it's `9007199254740992` – Thomas Oct 11 '16 at 06:23
  • 1
    @Thomas, yes, please see my note at the end of the question – Max Koretskyi Oct 11 '16 at 06:26
  • there is not rounding, JS simply runs out of space to write the bits that would be needed to properly describe this number, and you get the results without these bits. Maybe I don't get your question, because if you understand why this number can't be stored as it is, then what are you asking? – Thomas Oct 11 '16 at 06:28
  • @Thomas, please see my update – Max Koretskyi Oct 11 '16 at 06:42
  • Haven't you got the bit patterns for 9007199254740992 and 9007199254740994 mixed up in the question? – Thomas Padron-McCarthy Oct 11 '16 at 06:55
  • `(9007199254740993).toString(2) // 100000000000000000000000000000000000000000000000000000` why do you assume the mentioned behaviour? or is that just your expected/wished behaviour? – Thomas Oct 11 '16 at 06:55
  • @ThomasPadron-McCarthy, no, I don't see where. Can you clarify where do you see inconsistency? – Max Koretskyi Oct 11 '16 at 06:58
  • @Thomas, try converting `100000000000000000000000000000000000000000000000000000` to integer. `(9007199254740993).toString(2)` in terms of under the hood floating point operations is the really this `(9007199254740992).toString(2)`. `(9007199254740992).toString(2) === (9007199254740993).toString(2) // true` – Max Koretskyi Oct 11 '16 at 07:01
  • And we're back to the point of *what is your question?* – Thomas Oct 11 '16 at 07:02
  • *"what rules define this process?"* – You could always go and actually read [the spec](http://ieeexplore.ieee.org/document/4610935/), the answer will be in there somewhere… – deceze Oct 11 '16 at 07:09
  • @ThomasPadron-McCarthy, please see my update – Max Koretskyi Oct 11 '16 at 07:11
  • @deceze, right, that's what I'll have to do if I don't get answer here. But that's Q&A site, so I ask questions to save myself precious time. – Max Koretskyi Oct 11 '16 at 07:12
  • Now, after your latest edit, I am more confused about your question. "So if JS simply discarded the bits that can't be stored, we would have 52 zeros stored. But instead we have 51 zeros and 1". No, we don't. We _do_ have 52 zeroes, and the number 9007199254740992. – Thomas Padron-McCarthy Oct 11 '16 at 07:27
  • And that's something yourself pointed out, in your last comment to me. show us some code/implementations/whatever where `9007199254740993 === 9007199254740994` is true.The Point we're talking here all the time is, that JS ain't able to properly represent the value `9007199254740993` and therefore truncates the least significant bits (the name literally describes what they are). It's like you being asked to write the number 1001 (decimal) but limited to 3 digits. The best you get is 1e3 or wich is 1000, not 1001. And that's exactly what happens in binary representation too. – Thomas Oct 11 '16 at 07:47
  • @ThomasPadron-McCarthy, yeah, it seems that I've got messed up with the numbers. So is that really what it is - just truncating? No special rules for how the numbers should be truncated? Can you post it as an answer? – Max Koretskyi Oct 11 '16 at 08:12
  • @Thomas, thanks, I got messed up with the numbers. It indeed seems to be just truncating. – Max Koretskyi Oct 11 '16 at 08:15
  • @deceze, it seems that I need to pay for the spec. Is it really that it isn't free? – Max Koretskyi Oct 11 '16 at 08:19
  • @Maximus: It's unlikely that it's just truncating. Try an input of `9007199254740995`: you should see it being rounded up to `9007199254740996`. – Mark Dickinson Oct 11 '16 at 08:31
  • @MarkDickinson, yeah, if just truncating it would be `9007199254740994`. I'll update the question with this example. – Max Koretskyi Oct 11 '16 at 08:35

1 Answers1

4

Yes, the rounding rules are exactly the same for all floating point numbers, and has nothing to do with whether or not they are integers.

When the number can't be represented by the format, it is rounded to the nearest representable value; if it is exactly halfway between two such values, it is rounded to the nearest "even" number, which is the one with a 0 in the last place of the significand.

In the case of 9007199254740995, it is exactly half-way between two floating point numbers: 9007199254740994, with significand:

1 0000000000000000000000000000000000000000000000000001

and 9007199254740996, with significand

1 0000000000000000000000000000000000000000000000000010

In this case, 9007199254740996 is the even one, so the result is rounded to that.

Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
  • thanks, the eveness is checked against numbers with 52 bits? because both numbers when unrounded are even: `100000000000000000000000000000000000000000000000000010` and `100000000000000000000000000000000000000000000000000100` – Max Koretskyi Oct 11 '16 at 09:37
  • by the way, I wrote [the article](https://medium.com/@maximus.koretskyi/how-to-round-binary-fractions-625c8fa3a1af#.6pyj9duo8) about rounding binary fractions. If you have time and willigness, I'd be more than happy if you reviewed it for inconsistencies. thanks in advance – Max Koretskyi Oct 11 '16 at 09:41
  • @Maximus You round according to the value of significant bit 53: if 0, leave it be (effectively truncate); if 1, round up (that will give you a 0 at bit 53). BTW, you should always think in terms of 53 bits, not 52 plus a hidden bit -- being hidden has no bearing. – Rick Regan Oct 13 '16 at 13:17
  • @SimonByrne, can you please take a look my [this question](http://stackoverflow.com/questions/40082459/what-is-overflow-and-underflow-in-floating-point). – Max Koretskyi Oct 17 '16 at 09:44