5

I realize the np.islcose() function can be used to safely check floating point numbers for equality. What's tripping me up at the moment, though, is that I get varied results from using the standard <= operator. For example:

add_to = 0.05
value64 = np.float64(0.3) + add_to*4
value32 = np.float32(0.3) + add_to*4
threshold = 0.5
print('is close?')
print(np.isclose(value64, threshold))
print(np.isclose(value32, threshold))
print('is less than or equals to?')
print(value64 <= threshold)
print(value32 <= threshold)

Gives me

is close?
True
True
is less than or equals to?
True
False

Does anyone have a sensible workaround for this? I thought one option might be overload the python comparison operators for numpy floating points, and (within that function) round both floats up to, say, their 8th decimal place. But this is in a context where speed is somewhat important, and that feels a bit cumbersome.

Thanks in advance for any help!

Chris J Harris
  • 1,597
  • 2
  • 14
  • 26
  • 1
    Part of the problem is that `value32 = np.float32(0.3) + add_to*4` upcasts the 32-bit float to a 64-bit float and the resulting value is slightly larger than expected. You can see that by directly casting it like this `print(np.float64(np.float32(0.3)))`, which outputs `0.30000001192092896`. One possible solution is to cast everything back to 32-bit when you do the comparisons, but I don't know if this would meet your performance requirements. – Craig May 02 '19 at 02:50
  • @Craig: No, the upcasted value is exactly equal to the original value. It's just displayed to higher precision, because there are closer 64-bit floats to the exact value of decimal 0.3, and NumPy wants to distinguish the value from other 64-bit floats. Printing `np.float32(0.3)`, NumPy only prints enough precision to distinguish it from other 32-bit floats. – user2357112 May 02 '19 at 02:58
  • 2
    "I realize the np.islcose() function can be used to safely check floating point numbers for equality" - no, it cannot be used that way. There is no safe, general way of comparing floating-point numbers for equality. `np.isclose` makes a different tradeoff from the `==` operator, and fails in different ways, but you cannot just replace `==` with `np.isclose` and assume you're safe now. – user2357112 May 02 '19 at 03:00
  • @Craig - thanks, this seems sensible but unfortunately I just gave it a go and it hasn't solved the problem in this case. – Chris J Harris May 02 '19 at 03:04
  • What is the actual situation in which you want to compare numbers? What are you using floating-point for? What computations are performed on the floating-point numbers? – Eric Postpischil May 02 '19 at 12:01

2 Answers2

15

You can define functions that combine < and > with isclose.

def approx_lte(x, y):
    return x <= y or np.isclose(x, y)
def approx_gte(x, y):
    return x => y or np.isclose(x, y)

These are analogous to <= and >=, except they also use np.isclose() to test for equality.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • thanks, yes, that does seem like a potential solution. The issue though is that (I have just discovered) the numpy float data type is built-in and so un-over-load-able, and that the comparison operator is intended to be used by the user in the API. So I could create the function above, but I'd have to then find some way to make sure the user always stuck to using it rather the conventional comparison operator. – Chris J Harris May 02 '19 at 02:44
  • 2
    Note that you **have to use scalars** for those functions, so perform this elementwise. Otherwise for `a = np.array([1, 1, 2])` and `b = np.array([1, 8, 1.99999999999])` `approx_lte(a, b)` (`a <= b`) returns `False` because not all elements are less but also not all elements are equal. – miile7 Jun 30 '20 at 10:40
3

According to this Difference between Python float and numpy float32, there is a difference between how python sees np.float32 and np.float64. If you actually check the intermediate values of value64 and value32, you'll see:

value32 = 0.5000000119209289
value64 = 0.5

which explains why print(value32 <= threshold) evaluates to false. Due to binary errors I doubt rounding to the eighth decimal place would be safe, as with value32 you will have 0.50000001.

You should also consider that the time it takes to round a number is absolutely tiny, and still has to be used in the case of

np.float64(0.1) + np.float64(0.2)

as this evaluates to 0.30000000000000004, and so you would have errors when using a >= or <=. This error also occurs if you use the decimal library. In short, there are some numbers that you can't really avoid having some kind of error on. The only way I know of to circumvent this is to round.

I tried testing a bunch of combinations with rounding and without, and it was so quick I wasn't able to record the time differences over 10000 iterations, so unless you're trading stocks or training a neural network for weeks I don't think rounding a number is something you need to be worried about.

If you're worried about the arbitrary nature of where to round a number to, I would search for a string of 0's longer then 4 following a significant figure, then cut it off there.

kmario23
  • 57,311
  • 13
  • 161
  • 150
Recessive
  • 1,780
  • 2
  • 14
  • 37
  • 1
    thanks very much for this - so the message is essentially that there's no magic bullet solution, but that rounding is so fast that it doesn't really matter. – Chris J Harris May 02 '19 at 03:08
  • 1
    FYI, people nowadays are training deep NNs on half-precision or even at bit precision level. So, these things are not at all of a concern, for any practical implementations. – kmario23 May 02 '19 at 03:12