6

Continuing from Difference between Python float and numpy float32:

import numpy as np

a = 58682.7578125
print(type(a), a)
float_32 = np.float32(a)
print(type(float_32), float_32)
print(float_32 == a)

Prints:

<class 'float'> 58682.7578125
<class 'numpy.float32'> 58682.8
True

I fully understand that comparing floats for equality is not a good idea but still shouldn't this be False (we're talking about differences in the first decimal digit, not in 0.000000001) ? Is it system dependent ? Is this behavior somewhere documented ?

EDIT: Well it's the third decimal:

print(repr(float_32), repr(a))
# 58682.758 58682.7578125

but can I trust repr ? How are those stored internally in the final end ?

EDIT2: people insist that printing float_32 with more precision will give me its representation. However as I already commented according to nympy's docs:

the % formatting operator requires its arguments to be converted to standard python types

and:

print(repr(float(float_32)))

prints

58682.7578125

An interesting insight is given by @MarkDickinson here, apparently repr should be faithful (then he says it's not faithful for np.float32).

So let me reiterate my question as follows:

  • How can I get at the exact internal representation of float_32 and a in the example ? If these are the same, then problem solved if not,
  • What are the exact rules for up/downcasting in a comparison between python's float and np.float32 ? I 'd guess that it upcasts float_32 to float although @WillemVanOnsem suggests in the comments it's the other way round

My python version:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • 2
    it's probably a display setting. It would be surprising that `numpy` rounds the value like this. It just probably stores the float in a FP register, which explains why the comparison is `True` – Jean-François Fabre Jul 04 '17 at 08:35
  • Isn't this more about how things are printed? Furthermore I guess when you check equality, the second operand is converted to a 32-bit float first. – Willem Van Onsem Jul 04 '17 at 08:37
  • @WillemVanOnsem: that could explain some - or is it the other way around ? – Mr_and_Mrs_D Jul 04 '17 at 08:38
  • @Jean-FrançoisFabre: see my edit - but can I trust `repr` ? What is stored in the memory finally ? – Mr_and_Mrs_D Jul 04 '17 at 08:38
  • very closely related: https://stackoverflow.com/questions/16963956/difference-between-python-float-and-numpy-float32 – Jean-François Fabre Jul 04 '17 at 08:41
  • The value is the same otherwise it would evaluate to `False` on the comparison. The `repr` is truncating the number of displayed decimal places, if you did `float_32.tolist()` you'd see that the value is the same – EdChum Jul 04 '17 at 08:42
  • it's not the same value, `float` is 64 bit, `float32` is 32 bit. But equality probably levels the `double` to `float` to be able to compare. – Jean-François Fabre Jul 04 '17 at 08:46
  • @Jean-FrançoisFabre - in the answer by VictorT `print( "%0.8f" % float_32 )` would print `58682.75781250` - but probbaly because python is transforming the float_32 to float (see ending remarks here: https://docs.scipy.org/doc/numpy/user/basics.types.html -> `For example, the % formatting operator requires its arguments to be converted to standard python types`) – Mr_and_Mrs_D Jul 04 '17 at 08:52
  • @EdChum: as noted in tolist [docs](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tolist.html) it transforms the elements of the array to its closest python type - hence float, hence ` 58682.7578125` - actually tolist's starnge behavior was the root of my question - see https://stackoverflow.com/questions/1966207/converting-numpy-array-into-python-list-structure/1966210?noredirect=1#comment76765322_1966210 – Mr_and_Mrs_D Jul 04 '17 at 09:01
  • 2
    Interesting, when I pack `struct.pack("f", a)` and `struct.pack("f", float_32)` i get the same bytes: `b'\xc2:eG'`, same when I use the double (8 byte), i.e. `struct.pack("d", a)` and `struct.pack("d", float_32)` both give `b'\x00\x00\x00@X\xa7\xec@'` – juanpa.arrivillaga Jul 04 '17 at 09:01
  • @juanpa.arrivillaga but the value will differ when you pack 58682.758 as plain float or numpy.float64. – Netch Jul 04 '17 at 09:41
  • @Netch yes, that makes sense since `58682.758 != 58682.7578125` – juanpa.arrivillaga Jul 04 '17 at 09:42
  • @juanpa.arrivillaga: does struct convert those to python float ? – Mr_and_Mrs_D Jul 04 '17 at 11:48

5 Answers5

11

The numbers compare equal because 58682.7578125 can be exactly represented in both 32 and 64 bit floating point. Let's take a close look at the binary representation:

32 bit:  01000111011001010011101011000010
sign    :  0
exponent:  10001110
fraction:  11001010011101011000010

64 bit:  0100000011101100101001110101100001000000000000000000000000000000
sign    :  0
exponent:  10000001110
fraction:  1100101001110101100001000000000000000000000000000000

They have the same sign, the same exponent, and the same fraction - the extra bits in the 64 bit representation are filled with zeros.

No matter which way they are cast, they will compare equal. If you try a different number such as 58682.7578124 you will see that the representations differ at the binary level; 32 bit looses more precision and they won't compare equal.

(It's also easy to see in the binary representation that a float32 can be upcast to a float64 without any loss of information. That is what numpy is supposed to do before comparing both.)

import numpy as np

a = 58682.7578125
f32 = np.float32(a)
f64 = np.float64(a)

u32 = np.array(a, dtype=np.float32).view(dtype=np.uint32)
u64 = np.array(a, dtype=np.float64).view(dtype=np.uint64)

b32 = bin(u32)[2:]
b32 = '0' * (32-len(b32)) + b32  # add leading 0s
print('32 bit: ', b32)
print('sign    : ', b32[0])
print('exponent: ', b32[1:9])
print('fraction: ', b32[9:])
print()

b64 = bin(u64)[2:]
b64 = '0' * (64-len(b64)) + b64  # add leading 0s
print('64 bit: ', b64)
print('sign    : ', b64[0])
print('exponent: ', b64[1:12])
print('fraction: ', b64[12:])
MB-F
  • 22,770
  • 4
  • 61
  • 116
  • Thanks - I am a bit confused with `unpackbits` approach vs the `bin` approach - shouldn't they yield same results ? Very nice printouts though :) – Mr_and_Mrs_D Jul 05 '17 at 11:52
  • @Mr_and_Mrs_D That's a byte-order thing The uint8 view starts with the lowest byte first so the bits are somewhat scrambled. This should give you the same result: `np.unpackbits(np.array([b]).view(np.uint8)[::-1])` – MB-F Jul 05 '17 at 11:58
  • You bet @user2357112 by one min (whose answer was also very helpful) – Mr_and_Mrs_D Jul 12 '17 at 10:36
2

The same value is stored internally, only it doesn't show all digits with a print

Try:

 print "%0.8f" % float_32

See related Printing numpy.float64 with full precision

Victor T
  • 398
  • 3
  • 13
  • I am aware of that question but as discussed in its comments `repr` should be exact: https://stackoverflow.com/questions/12956333/printing-numpy-float64-with-full-precision#comment17565191_12956333 – Mr_and_Mrs_D Jul 04 '17 at 08:46
  • See my comment here: https://stackoverflow.com/questions/44900912/numpys-float32-and-float-comparisons?noredirect=1#comment76779840_44900912 – Mr_and_Mrs_D Jul 04 '17 at 08:52
  • 2
    @Mr_and_Mrs_D: `repr` isn't _exact_; it's _faithful_, in that it provides enough decimal digits that the resulting string rounds back to the original value (e.g., under `eval` or `float`). If `repr` were to show the exact value of (for example) the float entered as `58682.8`, it would show `58682.800000000002910383045673370361328125`. – Mark Dickinson Jul 04 '17 at 18:31
  • Thanks @MarkDickinson - even if it's a float32 or 16 in numpy ? Then how am I to inspect what I have in memory ? Would you care to attempt an answer to clear up the confusion ? (the suggestion in the answer here, using `print "%0.8f" % float_32`, internally converts to a float IIRC so it's not the float_32 that's printed _anyway_) – Mr_and_Mrs_D Jul 04 '17 at 18:36
  • Yes, even for `float32` or `float16`. For example, the smallest exactly representable positive `float16` value is 2**-24, whose exact value in decimal form is `5.9604644775390625e-8`. Its `repr` is `5.9605e-08`. – Mark Dickinson Jul 04 '17 at 18:54
  • 1
    Hmm. There's something rather odd here. It turns out that repr is _not_ faithful for NumPy `float32`. – Mark Dickinson Jul 04 '17 at 19:12
  • @MarkDickinson indeed, I'd rather find 5.9604645e-8 – aka.nice Jul 05 '17 at 07:17
2

The decimal 58682.7578125 is the exact fraction (7511393/128).

The denominator is a power of 2 (2**7), and the numerator span 23 bits. So this decimal value can be represented exactly both in float32 (which has 24 bits significand) and float64.

Thus the answer of Victor T is correct: in internal representation, it's the same value.

The fact that equality answer true for same value, even for different types is a good thing IMO, what do you expect of (2 == 2.0)?

aka.nice
  • 9,100
  • 1
  • 28
  • 40
  • The answer of Victor T is not correct as I already commented: https://stackoverflow.com/questions/44900912/numpys-float32-and-float-comparisons/44918864#comment76779840_44900912 - `%f` _implicitly converts to float_. So any way I can get at the internal representation without converting to float therefore answering another question ? – Mr_and_Mrs_D Jul 05 '17 at 08:32
  • @Mr_and_Mrs_D the first sentence is correct, but the proof is effectively a bit short. Though, with the implicit knowledge that float32 -> float64 conversion is lossless, it could be considered correct. – aka.nice Jul 05 '17 at 12:10
  • " with the implicit knowledge that float32 -> float64 conversion is lossless " -> ok that was the missing piece :) – Mr_and_Mrs_D Jul 05 '17 at 14:27
2

They're equal. They're just not printing the same because they use different printing logic.

How can I get at the exact internal representation of float_32 and a in the example ?

Well, that depends on what you mean by "exact internal representation". You can get an array of bit values, if you really want one:

>>> b = numpy.float32(a)
>>> numpy.unpackbits(numpy.array([b]).view(numpy.uint8))
array([1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=uint8)

which is as close as you'll get to the "exact internal representation", but it's not exactly the most useful thing to work with. (Also, the results will be endianness-dependent, because it really is based on the raw internal representation.)

If you want a C-level float, which is how NumPy represents float32 values at C level... well, that's C. Unless you want to write your own C extension module, you can't work with C-level values directly. The closest you can get is some sort of wrapper around a C float, and hey! You already have one! You don't seem happy with it, though, so this isn't really what you want.

If you want the exact value represented in human-readable decimal, printing it with extra precision using str.format or by converting it to a regular float and then a decimal.Decimal would do that.

>>> b
58682.758
>>> decimal.Decimal(float(b))
Decimal('58682.7578125')

The 58682.7578125 value you picked happens to be exactly representable as a float, so the decimal representation coming out happens to be exactly the one you put in, but that won't usually be the case. The exact decimal representation you typed in is discarded and unrecoverable.

What are the exact rules for up/downcasting in a comparison between python's float and np.float32 ?

The float32 gets converted to a float64, losslessly.

user2357112
  • 260,549
  • 28
  • 431
  • 505
0

58682.8

My machine shows 58682.758 for this line.

I fully understand that comparing floats for equality is not a good idea

It is "not a good idea" if they calculated independently. On the other hand, it is a good idea if you get the same number and check its conversion.

Is it system dependent ? Is this behavior somewhere documented ?

It's fully dependent on conversion to text. According to comments, float32 is essential. If so, the guaranteed accuracy for float32 is 7 decimal digits, unlike Python's internal float that is float64 (at least on x86). That is why the value is truncated in print. The recommended way to print float values in decimal is to stop when output form is that converts back to the same internal value. So it reduces 58682.7578125 to 58682.758: the difference is less than ULP.

The same value printed as internal "float" or numpy float64 will have more significant digits because their omission will result in another internal value:

>>> 58682.758 == 58682.7578125
False
>>> numpy.float32(58682.758) == numpy.float32(58682.7578125)
True
>>> print(repr(numpy.float32(58682.758).data[0:4]))
'\xc2:eG'
>>> print(repr(numpy.float32(58682.7578125).data[0:4]))
'\xc2:eG'
>>> numpy.float64(58682.758) == numpy.float64(58682.7578125)
False
>>> print(numpy.float64(58682.758).hex(), numpy.float64(58682.7578125).hex())
('0x1.ca7584189374cp+15', '0x1.ca75840000000p+15')

You are lucky these two values are equal in float32 with this concrete value (was this intentional?) but it might be different with other one.

Netch
  • 4,171
  • 1
  • 19
  • 31
  • I am not asking for float64.... I am asking for float32 knowing of course the conversion is lossy. And formating with precision converts everything to _python's float_ so it's not answering my question – Mr_and_Mrs_D Jul 04 '17 at 09:02
  • If so, the only factor principal for you is that str() provides the text form with reduced accuracy (6 significant digits, by default). To get fully printed value, use explicit precision settins. – Netch Jul 04 '17 at 09:05
  • Precision settings have nothing to do - they convert tuypes: https://stackoverflow.com/questions/44900912/numpys-float32-and-float-comparisons#comment76779840_44900912 - and I used `repr` in my edit – Mr_and_Mrs_D Jul 04 '17 at 09:06
  • Yep -> `58682.758 58682.7578125` – Mr_and_Mrs_D Jul 04 '17 at 09:07
  • Yes, 58682.758 is really the best accuracy you can get with float32. It provides 7 guaranteed decimal digits. 8th is a gift for some values. That's why it is printed this way. The recommended way to print float values in decimal is to stop when output form is that converts back to the same internal value. So it reduces 58682.7578125 to 58682.758: the difference is less than ULP. – Netch Jul 04 '17 at 09:08
  • Thanks for editing out the off topic float64 - please try a bit to explain the `print(repr(float_32), repr(a))` results that should be exact – Mr_and_Mrs_D Jul 04 '17 at 09:23
  • Please specify your Python version. My one is 2.7.13. Likely yours is older so its text conversions could be someway buggy. – Netch Jul 04 '17 at 09:28
  • @Netch: It's Python 3, judging by the tags and the output of `print(type(a), a)` (which would have printed a tuple in Python 2). – Mark Dickinson Jul 05 '17 at 09:51
  • @MarkDickinson ok, it's somewhat pity but they returned to print when str() with default 6 significant digits, as recommended in C standards. I would treat this as unfriendly. – Netch Jul 06 '17 at 10:32