2

Using following (near minimal) example:

import numpy as np
for x in np.arange(0,2,0.1):
    print(x)

We get:

0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0
1.1
1.2000000000000002
1.3
1.4000000000000001
1.5
1.6
1.7000000000000002
1.8
1.9000000000000001

as output.

I get that 'floating numbers precision issues' are to blame for the X.X000001 outputs but what I don't understand is how come it sometimes DOES work. Clearly 0.3 cannot be represented precisely in base 2 by a float and I fail to see any pattern in the numbers that didn't display with just one decimal digit.

How come Python knows that 0.1 is sufficient to display a number? What kind of magic tells it to truncate the remaining digits? Why does it work only sometimes?

Radost
  • 309
  • 3
  • 11
  • Relevant: [Algorithm to convert an IEEE 754 double to a string?](https://stackoverflow.com/questions/7153979/algorithm-to-convert-an-ieee-754-double-to-a-string) – Amadan Apr 17 '19 at 12:16
  • Check `print(format(x, '.30f'))` and see that `0.1` has one more zero in its inaccurate float representation. What happens is that the default truncation limit seems to include 16 decimal digits, but the next nonzero is at 17th in cases where you see an "exact" value in the output. I couldn't quickly find this figure in the documentation, it might easily be an implemnetation detail. You shouldn't make anything important depend on the automatic formatting of floats after all. If you need to rely on this, print/round yourself, or even better, check _approximate_ equality of floats. – Andras Deak -- Слава Україні Apr 17 '19 at 12:26
  • 1
    Found a hint in [a tutorial](https://docs.python.org/3/tutorial/floatingpoint.html): "_Historically, the Python prompt and built-in repr() function would choose the one with 17 significant digits, 0.10000000000000001. Starting with Python 3.1, Python (on most systems) is now able to choose the shortest of these and simply display 0.1._". This sounds a lot like an implementation detail. – Andras Deak -- Слава Україні Apr 17 '19 at 12:27
  • @AndrasDeak So it stops printing at first zero digit after decimal point? This can't possibly be right... – Radost Apr 17 '19 at 12:28
  • For anyone looking at this with the Python `float` type, take into account that `numpy` has its own `float` types that are being printed here: https://docs.scipy.org/doc/numpy/user/basics.types.html – Martijn Pieters Apr 17 '19 at 12:28
  • @AndrasDeak So if it finds two possible representations it picks shorter one? It doesn't explain why it works sometimes and sometimes not... – Radost Apr 17 '19 at 12:30
  • @MartijnPieters I noticed that but doing the above tests with `x` vs `float(x)` gives the same results. Numpy scalars might be printed differently (i.e. just like native floats). – Andras Deak -- Слава Україні Apr 17 '19 at 12:33
  • I've reopened this because as it turns out there is no canonical question on *Numpy*'s floating point formatting that I could find. – Martijn Pieters Apr 17 '19 at 12:36
  • @Radost "So it stops printing at first zero digit after decimal point?" More like, don't print trailing zeros after truncating to 16 (or however many) digits. – chepner Apr 17 '19 at 12:53
  • 1
    @chepner it's more complicated than that, `0.3` and `0.4` share the same number of zeros in their decimal representation yet they are printed differently. The reason is the uniqueness criterion of the `dtoa` algorithm that Martijn mentions in his answer. – Andras Deak -- Слава Україні Apr 17 '19 at 12:55

1 Answers1

6

You are printing numpy.float64 objects, not the Python built-in float type, which uses David Gay's dtoa algorithm.

As of version 1.14, numpy uses the dragon4 algorithm to print floating point values, tuned to approach the same output as the David Gay algorithm used for the Python float type:

Numpy scalars use the dragon4 algorithm in "unique" mode (see below) for str/repr, in a way that tries to match python float output.

The numpy.format_float_positional() function documents this in a bit more detail:

unique : boolean, optional

If True, use a digit-generation strategy which gives the shortest representation which uniquely identifies the floating-point number from other values of the same type, by judicious rounding. If precision was omitted, print out all necessary digits, otherwise digit generation is cut off after precision digits and the remaining value is rounded.

So 0.2 can uniquely be presented by only printing 0.2, but the next value in the series (0.30000000000000004) can't, you have to include the extra digits to uniquely represent the exact value.

The how of this is actually quite involved; you can read a full report on this in Bungie's Destiny gameplay engineer Ryan Juckett's Printing Floating-Point Numbers series.

But basically the code outputting the string needs to determine what shortest representation exists for all decimal numbers clustering around the possible floating point number that can't be interpreted as the next or preceding possible floating point number:

floating point number line for 0.1, with the next and previous possible float values and possible representations

This image comes from The Shortest Decimal String That Round-Trips: Examples by Rick Regan, which covers some other cases as well. Numbers in blue are possible float64 values, in green are possible representations of decimal numbers. Note the grey half-way point markers, any representation that fits between those two half-way points around a float value are fair game, as all of those representations would produce the same value.

The goal of both the David Gay and Dragon4 algorithms is to find the shortest decimal string output that would produce the exact same float value again. From the Python 3.1 What's New section on the David Gay approach:

Python now uses David Gay’s algorithm for finding the shortest floating point representation that doesn’t change its value. This should help mitigate some of the confusion surrounding binary floating point numbers.

The significance is easily seen with a number like 1.1 which does not have an exact equivalent in binary floating point. Since there is no exact equivalent, an expression like float('1.1') evaluates to the nearest representable value which is 0x1.199999999999ap+0 in hex or 1.100000000000000088817841970012523233890533447265625 in decimal. That nearest value was and still is used in subsequent floating point calculations.

What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).

The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

The new algorithm tends to emit cleaner representations when possible, but it does not change the underlying values. So, it is still the case that 1.1 + 2.2 != 3.3 even though the representations may suggest otherwise.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I'd emphasize more what the native python behaviour is, because I think numpy in the question is a red herring. OP could've gotten the same output and the same questions with `for x in range(21): print(x*0.1)`. – Andras Deak -- Слава Україні Apr 17 '19 at 12:53
  • @AndrasDeak: we can cover that too, but the differences between David Gay's and Dragon4 are so academical and my head so flu-ridden today that I probably have to leave that for now. And I think there is already another post here on SO that covers the Python `float` type angle already. – Martijn Pieters Apr 17 '19 at 12:57
  • Oh, hope you get well soon! If you end up finding that native float formatting post it would be worth linking it here at least in comments; I couldn't find it. – Andras Deak -- Слава Україні Apr 17 '19 at 12:58
  • 1
    @AndrasDeak: there is [Unexpected floating-point representations in Python](//stackoverflow.com/q/11242062), there may be more. – Martijn Pieters Apr 17 '19 at 13:01