What is the nature of the round off error here?

Question

Can someone help me unpack what exactly is going on under the hood here?

>>> 1e16 + 1.
1e+16
>>> 1e16 + 1.1
1.0000000000000002e+16

I'm on 64-bit Python 2.7. For the first, I would assume that since there is only a precision of 15 for float that it's just round-off error. The true floating-point answer might be something like

10000000000000000.999999....

And the decimal just gets lopped of. But the second result makes me question this understanding and can't 1 be represented exactly? Any thoughts?

[Edit: Just to clarify. I'm not in any way suggesting that the answers are "wrong." Clearly, they're right, because, well they are. I'm just trying to understand why.]

0.1 infinitely repeats in binary floating point, kinda like 1/3 in decimal. — Fred Larson, Feb 25 '13 at 03:19
I do know that, but I don't understand how that would lead to the round-off behavior exhibited. — jseabold, Feb 25 '13 at 03:23
Also note that the results in Python 2.6 and earlier would be different. 2.7 included a feature to show the shortest possible decimal string (http://docs.python.org/2/whatsnew/2.7.html#other-language-changes). With 2.6, the first result is `10000000000000000.0` — Ned Deily, Feb 25 '13 at 04:14

score 10 · Accepted Answer · answered Feb 25 '13 at 03:56

10

It's just rounding as close as it can.

1e16 in floating hex is 0x4341c37937e08000.

1e16+2 is 0x4341c37937e08001.

At this level of magnitude, the smallest difference in precision that you can represent is 2. Adding 1.0 exactly rounds down (because typically IEEE floating point math will round to an even number). Adding values larger than 1.0 will round up to the next representable value.

answered Feb 25 '13 at 03:56

StilesCrisis

15,972
4
39
62

1

Thanks. One question. Why hexadecimal? Ease of representation or is there some intrinsic reason that the underlying arithmetic is in hexadecimal? I'll need to dig deeper I think to understand the rounding to an even number (beyond this example). – jseabold Feb 25 '13 at 04:22
1

@jseabold Hexadecimal is a power-of-2 base, which means that a number is a “repeating decimal” (not actually decimal) in hexadecimal if-and-only-if it is also repeating in binary, which avoids confusion for this topic. – Kevin Reid Feb 25 '13 at 04:31
Floating point varies its precision as the number's magnitude changes. For very small numbers, it can represent minuscule variations. For very large numbers, the rounding error can be enormous--but crucially, it's still tiny relative to the numbers being represented. The rounding error grows as the values themselves do. – StilesCrisis Feb 25 '13 at 07:33
And of course the "region" in which the precision is `2`, is from `9.0071992547409920e+15` to `1.8014398509481984e+16`, the "duade" between 2**53 and 2**54. – Jeppe Stig Nielsen Feb 25 '13 at 23:55

Eric Bainville · Answer 2 · 2013-02-25T04:29:36.493

10^16 = 0x002386f26fc10000 is exactly representable as a double precision floating point number. The next representable number is 1e16+2. 1e16+1 is correctly rounded to 1e16, and 1e16+1.1 is correctly rounded to 1e16+2. Check the output of this C program:

#include <stdio.h>
#include <math.h>
#include <stdint.h>

int main()
{
  uint64_t i = 10000000000000000ULL;
  double a = (double)i;
  double b = nextafter(a,1.0e20); // next representable number
  printf("I=0x%016llx\n",i); // 10^16 in hex
  printf("A=%a (%.4f)\n",a,a); // double representation
  printf("B=%a (%.4f)\n",b,b); // next double
}

Output:

I=0x002386f26fc10000
A=0x1.1c37937e08p+53 (10000000000000000.0000)
B=0x1.1c37937e08001p+53 (10000000000000002.0000)

This doesn't compile for me? (I'm no C expert.) [**Edit** Spoke too soon. Need to use g++ not gcc.] Thanks. — jseabold, Feb 25 '13 at 13:48

Kevin Reid · Answer 3 · 2013-02-25T04:42:02.853

Let's decode some floats, and see what's actually going on! I'm going to use Common Lisp, which has a handy function for getting at the significand (a.k.a mantissa) and exponent of a floating-point number without needing to twiddle any bits. All floats used are IEEE double-precision floats.

> (integer-decode-float 1.0d0)
4503599627370496
-52
1

That is, if we consider the value stored in the significand as an integer, it is the maximum power of 2 available (4503599627370496 = 2^52), scaled down (2^-52). (It isn't stored as 1 with an exponent of 0 because it's simpler for the significand to never have zeros on the left, and this allows us to skip representing the leftmost 1 bit and have more precision. Numbers not in this form are called denormal.)

Let's look at 1e16.

> (integer-decode-float 1d16)
5000000000000000
1
1

Here we have the representation (5000000000000000) * 2^1. Note that the significand, despite being a nice round decimal number, is not a power of 2; this is because 1e16 is not a power of 2. Every time you multiply by 10, you are multiplying by 2 and 5; multiplying by 2 is just incrementing the exponent, but multiplying by 5 is an "actual" multiplication, and here we've multiplied by 5 16 times.

5000000000000000 = 10001110000110111100100110111111000001000000000000000 (base 2)

Observe that this is a 53-bit binary number, as it should be since double floats have a 53-bit significand.

But the key to understanding the situation is that the exponent is 1. (The exponent being small is an indication that we are getting close to the limits of precision.) This means that the float value is 2^1 = 2 times this significand.

Now, what happens when we try to represent adding 1 to this number? Well, we need to represent 1 at the same scale. But the smallest change we can make in this number is exactly 2, because the least significant bit of the significand has value 2!

That is, if we increment the significand, making the smallest possible change, we get

5000000000000001 = 10001110000110111100100110111111000001000000000000001 (base 2)

and when we apply the exponent, we get 2 * 5000000000000001 = 10000000000000002, which is exactly the value you observed. You can only have either 10000000000000000 or 10000000000000002, and 10000000000000001.1 is closer to the latter.

(Note that the issue here isn't even that decimal numbers aren't exact in binary! There's no binary "repeating decimals" here, and there's plenty of 0 bits on the right end of the significand — it's just that your input neatly falls just beyond the lowest bit.)

score 3 · Answer 4 · edited May 23 '17 at 12:04

With numpy, you can see the next larger and smaller representable IEEE floating point number:

>>> import numpy as np
>>> huge=1e100
>>> tiny=1e-100
>>> np.nextafter(1e16,huge)
10000000000000002.0
>>> np.nextafter(1e16,tiny)
9999999999999998.0

So:

>>> (np.nextafter(1e16,huge)-np.nextafter(1e16,tiny))/2.0
2.0

And:

>>> 1.1>2.0/2
True

Therefore 1e16 + 1.1 is correctly rounded to the next larger IEEE representable number of 10000000000000002.0

As is:

>>> 1e16+1.0000000000000005
1.0000000000000002e+16

and 1e16-(something slightly larger than 1) is rounded down by 2 to the next smaller IEEE number:

>>> 1e16-1.0000000000000005
9999999999999998.0

Keep in mind that 32 bit vs 64 bit Python is irrelevant. It is the size of the IEEE format used that matters. Also keep in mind that the larger the magnitude of the number, the epsilon value (the spread between the two next larger and smaller IEEE values basically) changes.

You can see this in bits as well:

>>> def f_to_bits(f): return struct.unpack('<Q', struct.pack('<d', f))[0]
... 
>>> def bits_to_f(bits): return struct.unpack('<d', struct.pack('<Q', bits))[0]
... 
>>> bits_to_f(f_to_bits(1e16)+1)
1.0000000000000002e+16
>>> bits_to_f(f_to_bits(1e16)-1)
9999999999999998.0

How did I not know about nextafter. That's much easier than mucking around with finfo objects. Thanks. — jseabold, Feb 25 '13 at 13:47
It is too bad nextafter is not part of the Python math library, but [it is easily added.](http://stackoverflow.com/a/6163157/298607) — dawg, Feb 25 '13 at 17:03

What is the nature of the round off error here?

4 Answers4