-1

I have a machine learning model which outputs probabilities up to the first decimal (i.e. 0.0, 0.1, 0.2, ... 1.0)

When writing a loop

for i in np.arange(0,1.1,0.1)
    if p >= i:
       print('yes for i = %f' % i)
    else:
       print('no for i = %f' % i)

but the problem is that np.arange outputted for three values a slight numerical error:

0.3 became 0.30000000000000004:

0.6 became 0.6000000000000001:

0.7 became 0.7000000000000001

when p=0.6, this would result in an output

no for i = 0.6000000000000001

How could I circumvent this issue?

Wouter Vandenputte
  • 1,948
  • 4
  • 26
  • 50
  • 4
    Does this answer your question? [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – Keldorn Apr 21 '20 at 14:03
  • @Keldorn: OP's question was how to fix it, not why it happens, but some answers on your link contain the solution. A [numpy floating point comparison](https://stackoverflow.com/questions/42736514/numpy-floating-point-comparison) question is perhaps more nearby, though the answer is less useful in certain cases than your link. – Andris Apr 21 '20 at 15:23
  • 1
    @Andris OP's question is based on a false premise. 0.3 did not become 0.30000000000000004, there is no 0.3 to begin with. This is due to a misunderstanding of floating point arithmetic. It looks like the moderators agreed, since the question was closed as duplicate. – Keldorn Apr 21 '20 at 17:22
  • I'm aware of floating point representations but the scitkit learn model returns this as a value, so when some values are not representable, how did they return this in the first place? – Wouter Vandenputte Apr 22 '20 at 20:25

1 Answers1

0

That "slight numerical error" is not a numerical error but a known issue with floating point numbers: they are sums of fractions of two stored binary and can't represent all real numbers.

If you wish to print precise n-digit numbers, you may want to print with the format string e.g. %.1f instead of %f.

To fix the comparison behaviour you may either change how you store numbers (but given the field you used them I doubt that's a good idea) or change how you compare them: give room for an epsilon 'error' difference between two numbers. If you do not want to implement your own way to do it, you may use numpy/math isclose for checking equality like this:

import numpy as np

p = 0.6

for i in np.arange(0,1.1,0.1):
    if p > i or np.isclose(p,i):
        print('yes for i = %f' % i)
    else:
        print('no for i = %f' % i)

Depending on what you wish to do actually instead of the minimal example you gave, performing the comparisons in a single shot might be more efficient:

i = np.arange(0,1.1,0.1)
compres = np.logical_or(p>i, np.isclose(p,i))

for i, ge in zip(i, compres):
    if ge:
        print('yes for i = %.1f' % i)
    else:
        print('no for i = %.1f' % i)

EDIT: I removed the idea of multiplying by 10000 and rounding before comparison, as it may lead to rounding errors at e.g. a 0.649999 vs. 0.650001 case. It can be useful, but with caution.

Andris
  • 921
  • 7
  • 14