8

Here's a custom function that allows stepping through decimal increments:

def my_range(start, stop, step):
    i = start
    while i < stop:
        yield i
        i += step

It works like this:

out = list(my_range(0, 1, 0.1))
print(out)

[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]

Now, there's nothing surprising about this. It's understandable this happens because of floating point inaccuracies and that 0.1 has no exact representation in memory. So, those precision errors are understandable.

Take numpy on the other hand:

import numpy as np

out = np.arange(0, 1, 0.1)
print(out)
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9]) 

What's interesting is that there are no visible imprecision accuracies introduced here. I thought this might have to do with what the __repr__ shows, so to confirm, I tried this:

x = list(my_range(0, 1.1, 0.1))[-1]
print(x.is_integer())

False

x = list(np.arange(0, 1.1, 0.1))[-1]
print(x.is_integer())

True

So, my function returns an incorrect upper value (it should be 1.0 but it is actually 1.0999999999999999), but np.arange does it correctly.

I'm aware of Is floating point math broken? but the point of this question is:

How does numpy do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    At a guess, if numpy uses floating point multiplication, this can be a little less error prone. `0.1 * 10 == 1.0`, but `0.1 + 0.1 + ... != 1.0`. – Izaak van Dongen Aug 27 '17 at 16:45
  • Also try with `numpy.set_printoptions(precision=18)`. –  Aug 27 '17 at 16:48
  • I know I probably shouldn't post such a comment but: Thanks for asking the question. Because of that question I investigated the very interesting topic of floating point error accumulation (again), I made a pull request to fix a code-comment regarding an `np.arange` helper function and ... I got my NumPy gold badge because of the upvotes here! So **thank you**!!! – MSeifert Aug 27 '17 at 18:32
  • @MSeifert Congrats to you and thanks for answering :) – cs95 Aug 27 '17 at 19:15

3 Answers3

10

The difference in endpoints is because NumPy calculates the length up front instead of ad hoc, because it needs to preallocate the array. You can see this in the _calc_length helper. Instead of stopping when it hits the end argument, it stops when it hits the predetermined length.

Calculating the length up front doesn't save you from the problems of a non-integer step, and you'll frequently get the "wrong" endpoint anyway, for example, with numpy.arange(0.0, 2.1, 0.3):

In [46]: numpy.arange(0.0, 2.1, 0.3)
Out[46]: array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8,  2.1])

It's much safer to use numpy.linspace, where instead of the step size, you say how many elements you want and whether you want to include the right endpoint.


It might look like NumPy has suffered no rounding error when calculating the elements, but that's just due to different display logic. NumPy is truncating the displayed precision more aggressively than float.__repr__ does. If you use tolist to get an ordinary list of ordinary Python scalars (and thus the ordinary float display logic), you can see that NumPy has also suffered rounding error:

In [47]: numpy.arange(0, 1, 0.1).tolist()
Out[47]: 
[0.0,
 0.1,
 0.2,
 0.30000000000000004,
 0.4,
 0.5,
 0.6000000000000001,
 0.7000000000000001,
 0.8,
 0.9]

It's suffered slightly different rounding error - for example, in .6 and .7 instead of .8 and .9 - because it also uses a different means of computing the elements, implemented in the fill function for the relevant dtype.

The fill function implementation has the advantage that it uses start + i*step instead of repeatedly adding the step, which avoids accumulating error on each addition. However, it has the disadvantage that (for no compelling reason I can see) it recomputes the step from the first two elements instead of taking the step as an argument, so it can lose a great deal of precision in the step up front.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I see. How does this up-front calculation make a difference? – cs95 Aug 27 '17 at 16:40
  • @cᴏʟᴅsᴘᴇᴇᴅ: Mostly, it's just different, so the results are sometimes different. That happens with floating point. I believe it does involve less rounding error, but you'll sometimes get the "wrong" endpoint anyway. – user2357112 Aug 27 '17 at 16:44
  • If you were to do `np.arange(0, 1.1, 0.1).tolist()`, you'd get an upper limit of 1.0 instead of `1.09999999999` which I thought was fascinating but from MSeifert's answer is explained by the different mechanism of computing the next item in the range. – cs95 Aug 27 '17 at 17:14
  • @cᴏʟᴅsᴘᴇᴇᴅ: No, the endpoint difference is due to stopping at a precomputed length instead of stopping when it hits the stop argument. *However*, if it did stop when it hit the stop argument, the different method of computing elements would *also* have caused it to stop at 1.0 instead of 1.0999999999999999. – user2357112 Aug 27 '17 at 17:19
  • Optimistically assuming a magic optimizer, recomputing the `step` allows the optimizer to arrange the computation so that *additional* bits ("guard bits", e.g.) of precision in the FPU can be retained for the subsequent multiply. If the `step` is passed to through a stack (or register) those bits can be lost. – Eric Towers Aug 27 '17 at 21:07
  • @EricTowers: Unfortunately, any information in such additional bits would still simply represent rounding error. There's no way for `arange` to improve on the accuracy of the original `step` argument. – user2357112 Aug 27 '17 at 22:35
  • @user2357112 : That's [not correct](https://en.wikipedia.org/wiki/Guard_digit). Information in such additional bits delays rounding error and can make it vanish in the observable output of the computation. Analogically, it is like accepting `float`s, computing in `double`s internally, then returning `float`. For simple calculations like this one, every bit of the result is correct and the last bit is correctly rounded, which cannot be promised using lesser intenal precision. – Eric Towers Aug 27 '17 at 22:40
  • @EricTowers: I'm aware of the principles of guard digits, but even with guard digits, the best representation of the step is still the original `step` argument `arange` received, with a bunch of zeros in the guard digits. Any nonzeros in the guard digits would be a result of rounding error - and that's not even considering the fact that this recomputation produces error extending beyond the guard digits. – user2357112 Aug 27 '17 at 22:59
  • @user2357112 : The binary representation of `0.1` is `0.00011001100110011001100110...` What's your basis for claiming the guard bits are zeroes? – Eric Towers Aug 27 '17 at 23:06
  • If I pass the 64-bit float `0.1` to `arange`, `arange` does not know whether I meant exactly one tenth. It has no basis to fill extra bits with anything but 0. – user2357112 Aug 27 '17 at 23:09
  • In extreme cases, the recomputation of the step can even cause nonzero steps to go to zero. – user2357112 Aug 27 '17 at 23:15
6

While arange does step through the range in a slightly different way, it still has the float representation issue:

In [1358]: np.arange(0,1,0.1)
Out[1358]: array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

The print hides that; convert it to a list to see the gory details:

In [1359]: np.arange(0,1,0.1).tolist()
Out[1359]: 
[0.0,
 0.1,
 0.2,
 0.30000000000000004,
 0.4,
 0.5,
 0.6000000000000001,
 0.7000000000000001,
 0.8,
 0.9]

or with another iteration

In [1360]: [i for i in np.arange(0,1,0.1)]  # e.g. list(np.arange(...))
Out[1360]: 
[0.0,
 0.10000000000000001,
 0.20000000000000001,
 0.30000000000000004,
 0.40000000000000002,
 0.5,
 0.60000000000000009,
 0.70000000000000007,
 0.80000000000000004,
 0.90000000000000002]

In this case each displayed item is a np.float64, where as in the first each is float.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    The `[i for i in ...]` comparison isn't fair, because something like `0.10000000000000001` isn't `arange`'s fault; it's exactly equal to the float `0.1`, but NumPy is printing it to 17 digits. – user2357112 Aug 27 '17 at 16:58
5

Aside from the different representation of lists and arrays NumPys arange works by multiplying instead of repeated adding. It's more like:

def my_range2(start, stop, step):
    i = 0
    while start+(i*step) < stop:
        yield start+(i*step)
        i += 1

Then the output is completely equal:

>>> np.arange(0, 1, 0.1).tolist() == list(my_range2(0, 1, 0.1))
True

With repeated addition you would "accumulate" floating point rounding errors. The multiplication is still affected by rounding but the error doesn't accumulate.


As pointed out in the comments it's not really what is happening. As far as I see it it's more like:

def my_range2(start, stop, step):
    length = math.ceil((stop-start)/step)
    # The next two lines are mostly so the function really behaves like NumPy does
    # Remove them to get better accuracy...
    next = start + step
    step = next - start
    for i in range(length):
        yield start+(i*step)

But not sure if that's exactly right either because there's a lot more going on in NumPy.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • This is performed by the relevant [`fill` function](https://github.com/numpy/numpy/blob/1d592c12ca7f9c7f471aa8d20b538c5cb4f2cdce/numpy/core/src/multiarray/arraytypes.c.src#L3556). While it does prevent accumulation of error, it has the flaw that it recomputes the step from the first two elements instead of receiving the original step as an argument, so it may lose a great deal of precision in the step up front. – user2357112 Aug 27 '17 at 17:12
  • Also, it doesn't stop when it hits the original stop argument; it stops when it hits a precomputed length. – user2357112 Aug 27 '17 at 17:14
  • @user2357112 Oh, I see. It's definetly more complicated than I thought. Seems weird to do it that way ... are you sure that this could lead to precision loss? I mean the [first element was computed by adding the step to the start](https://github.com/numpy/numpy/blob/dfc3eba72841e95fe2be44e1194dc5f77a1e2ec2/numpy/core/src/multiarray/ctors.c#L3069). – MSeifert Aug 27 '17 at 17:39
  • 2
    It can. For example, if the start is 100 and the step argument is 0.1, then computing `(100+0.1) - 100` causes NumPy to use an actual step of 0.09999999999999432. This causes `numpy.arange(0, 1000, 0.1)[-1]` to be significantly more accurate than `numpy.arange(100, 1000, 0.1)[-1]`. – user2357112 Aug 27 '17 at 17:53
  • Interesting. I wonder if I should fix the implementation to use that "nice" behavior. :D – MSeifert Aug 27 '17 at 17:55
  • At the very least, it seems like that should be raised on the bug tracker. Have you done a git blame to work out why it does that? – Eric Aug 28 '17 at 00:45
  • @Eric The last change was in https://github.com/numpy/numpy/pull/197 but there it hasn't really changed so it's reasonable to assume that it has been there for much longer. I'm always a bit wary about float precision problems so I assumed there was a valid reason for that behavior but if you want me to raise a ticket that's fine. I'll do so. – MSeifert Aug 28 '17 at 00:53
  • If nothing else, filing the issue might cause someone to explain the behavior, which can then be added as a comment to the implementation :) – Eric Aug 28 '17 at 00:57