4

This question is more for curiosity.

I'm creating the following array:

A = zeros((2,2))
for i in range(2):
    A[i,i] = 0.6
    A[(i+1)%2,i] = 0.4
print A

>>>
   [[ 0.6  0.4]
   [ 0.4  0.6]]

Then, printing it:

for i,c in enumerate(A):
    for j,d in enumerate(c):
        print j, d

But, if I remove the j, I got:

>>>
0 0.6
1 0.4
0 0.4
1 0.6

But if I remove the j from the for, I got:

(0, 0.59999999999999998)
(1, 0.40000000000000002)
(0, 0.40000000000000002)
(1, 0.59999999999999998)

It because the way I'm creating the matrix, using 0.6? How does it represent internally real values?

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
Pedro Dusso
  • 2,100
  • 9
  • 34
  • 64
  • You could use `decimal.Decimal` This will store them exactly as you expect. It's been in the standard library since 2.5 or 2.6 (I forget which). – Ian Stapleton Cordasco Apr 19 '13 at 13:46
  • 1
    Wait. "Matrix"? Is this standard Python, or NumPy or something else like that, or something else entirely? – michaelb958--GoFundMonica Apr 19 '13 at 13:49
  • @michaelb958: He's using a NumPy array, so the values being displayed aren't Python floats; they're NumPy objects of type `numpy.float64`. That, plus the difference between `str` and `repr`, plus the fact that computing the `str` of a tuple uses the `repr` of the items, explains what's going on here. – Mark Dickinson Apr 19 '13 at 20:20
  • @larsmans: I agree we have many, many squintillion similar questions, but this particular one has a twist because of the additional features noted by Mark Dickinson. So in this one case, I'm not voting to close. – John Y Apr 19 '13 at 20:49

2 Answers2

16

There are a few different things going on here.

First, Python has two mechanisms for turning an object into a string, called repr and str. repr is supposed to give 'faithful' output that would (ideally) make it easy to recreate exactly that object, while str aims for more human-readable output. For floats in Python versions up to and including Python 3.1, repr gives enough digits to determine the value of the float completely (so that evaluating the returned string gives back exactly that float), while str rounds to 12 decimal places; this has the effect of hiding inaccuracies, but means that two distinct floats that are very close together can end up with the same str value - something that can't happen with repr. When you print an object, you get the str of that object. In contrast, when you just evaluate an expression at the interpreter prompt, you get the repr.

For example (here using Python 2.7):

>>> x = 1.0 / 7.0
>>> str(x)
'0.142857142857'
>>> repr(x)
'0.14285714285714285'
>>> print x  # print uses 'str'
0.142857142857
>>> x  # the interpreter read-eval-print loop uses 'repr'
0.14285714285714285

But also, a little bit confusingly from your point of view, we get:

>>> x = 0.4
>>> str(x)
'0.4'
>>> repr(x)
'0.4'

That doesn't seem to tie in too well with what you were seeing above, but we'll come back to this below.

The second thing to bear in mind is that in your first example, you're printing two separate items, while in your second example (with the j removed), you're printing a single item: a tuple of length 2. Somewhat surprisingly, when converting a tuple for printing with str, Python nevertheless uses repr to compute the string representation of the elements of that tuple:

>>> x = 1.0 / 7.0
>>> print x, x  # print x twice;  uses str(x)
0.142857142857 0.142857142857
>>> print(x, x)  # print a single tuple; uses repr(x)
(0.14285714285714285, 0.14285714285714285)

That explains why you're seeing different results in the two cases, even though the underlying floats are the same.

But there's one last piece to the puzzle. In Python >= 2.7, we saw above that for the particular float 0.4, the str and repr of that float were the same. So where does the 0.40000000000000002 come from? Well, you don't have Python floats here: because you're getting these values from a NumPy array, they're actually of type numpy.float64:

>>> from numpy import zeros
>>> A = zeros((2, 2))
>>> A[:] = [[0.6, 0.4], [0.4, 0.6]]
>>> A
array([[ 0.6,  0.4],
       [ 0.4,  0.6]])
>>> type(A[0, 0])
<type 'numpy.float64'>

That type still stores a double-precision float, just like Python's float, but it's got some extra goodies that make it interact nicely with the rest of NumPy. And it turns out that NumPy uses a slightly different algorithm for computing the repr of a numpy.float64 than Python uses for computing the repr of a float. Python (in versions >= 2.7) aims to give the shortest string that still gives an accurate representation of the float, while NumPy simply outputs a string based on rounding the underlying value to 17 significant digits. Going back to that 0.4 example above, here's what NumPy does:

>>> from numpy import float64
>>> x = float64(1.0 / 7.0)
>>> str(x)
'0.142857142857'
>>> repr(x)
'0.14285714285714285'
>>> x = float64(0.4)
>>> str(x)
'0.4'
>>> repr(x)
'0.40000000000000002'

So these three things together should explain the results you're seeing. Rest assured that this is all completely cosmetic: the underlying floating-point value is not being changed in any way; it's just being displayed differently by the four different possible combinations of str and repr for the two types: float and numpy.float64.

The Python tutorial give more details of how Python floats are stored and displayed, together with some of the potential pitfalls. The answers to this SO question have more information on the difference between str and repr.

Community
  • 1
  • 1
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
3

Edit:

Don't mind me, I failed to realise that the question was about NumPy.


The strange 0.59999999999999998 and friends is Python's best attempt to accurately represent how all computers store floating point values: as a bunch of bits, according to the IEEE 754 standard. Notably, 0.1 is a non-terminating decimal in binary, and so cannot be stored exactly. (So, presumably, are 0.6 and 0.4.)

The reason you normally see 0.6 is most floating-point printing functions round off these imprecisely-stored floats, to make them more understandable to us humans. That's what your first printing example is doing.

Under some circumstances (that is, when the printing functions aren't trying for human-readable), the full, slightly-off number 0.59999999999999998 will be printed. That's what your second printing example is doing.

tl;dr

This is not Python's fault; it is just how floats are stored.

  • But why does one case print a "humanized" version while the other version prints to full precision? – default.kramer Apr 19 '13 at 13:45
  • Ja, I thought so. What is the type enumerate(A) generates? Tuple also? – Pedro Dusso Apr 19 '13 at 13:51
  • @default.kramer I don't know all the criteria for that sort of thing. Probably some parameter to some C function that nobody's touched in decades. – michaelb958--GoFundMonica Apr 19 '13 at 13:57
  • For questions regarding why sometimes the representation is more human-friendly than others, see Mark Dickinson's answer. (Note that he is uniquely qualified to answer, being that he's a core Python developer, and largely if not solely responsible for the way recent versions of Python round and display floats.) – John Y Apr 19 '13 at 20:30
  • @default.kramer: Also note that the representations with lots of digits aren't meaningfully "more" precise than representations with fewer digits. Since the exact number **does not even exist** in any finite binary system, both the long and short representations are approximations of what is really stored. – John Y Apr 19 '13 at 20:37
  • @JohnY: Thanks; now I'm embarrassed. Definitely not solely, though: I had plenty of help. :-) – Mark Dickinson Apr 19 '13 at 20:53