In Python small floats tending to zero

Question

I have a Bayesian Classifier programmed in Python, the problem is that when I multiply the features probabilities I get VERY small float values like 2.5e-320 or something like that, and suddenly it turns into 0.0. The 0.0 is obviously of no use to me since I must find the "best" class based on which class returns the MAX value (greater value).

What would be the best way to deal with this? I thought about finding the exponential portion of the number (-320) and, if it goes too low, multiplying the value by 1e20 or some value like that. But maybe there is a better way?

This is not math. In math, positive numbers can be arbitrarily small. This is floating point. — recursive, Sep 13 '10 at 21:41
@S. Lott This is definitely not a math question by any stretch. This has everything to do with floating point numbers and the way they work in Python as well as other programming languages. — Justin L., Sep 13 '10 at 21:45
I believe that 2.5e-320 is the exact probability that a whale will suddenly turn into a bowl of petunias. — Seth, Sep 13 '10 at 21:49
To be fair to S.Lott, this underflow problem is a standard one in the machine learning literature about Bayes classifiers. So the OP would have been much more served by asking in the relevant community. And by the way, [numerical analysis](http://en.wikipedia.org/wiki/Numerical_analysis) is the part of math that studies problems like this. — Muhammad Alkarouri, Sep 13 '10 at 21:59
@Seth: that should get you far using the infinite probability drive:) — Muhammad Alkarouri, Sep 13 '10 at 22:09
"This has everything to do with floating point numbers and the way they work in ... other programming languages". My point. The Python part of this is irrelevant. "this underflow problem is a standard one in the machine learning literature". Thanks. — S.Lott, Sep 13 '10 at 22:36

score 24 · Accepted Answer · edited Sep 13 '10 at 21:56

24

What you describe is a standard problem with the naive Bayes classifier. You can search for underflow with that to find the answer. or see here.

The short answer is it is standard to express all that in terms of logarithms. So rather than multiplying probabilities, you sum their logarithms.

You might want to look at other algorithms as well for classification.

edited Sep 13 '10 at 21:56

recursive

83,943
34
151
241

answered Sep 13 '10 at 21:50

Muhammad Alkarouri

23,884
19
66
101

Hey! thanks a lot for the answer, I will look into that, as it address my problem exactly. I was thinking that it should be common since I am multiplying probabilities like 3.14e-05 multiple times, so they reach e-300 levels (for example) pretty fast, even more when I have a lot of features in my classifier. – Pravel Sep 14 '10 at 02:03
Yeah as recursive also mentioned, this is tackled by using the logarithms and adding the probabilities. In the link provided by Muhammad it's all explained. Thanks everyone for your answers! – Pravel Sep 14 '10 at 02:31

score 20 · Answer 2 · answered Sep 13 '10 at 21:43

20

Would it be possible to do your work in a logarithmic space? (For example, instead of storing 1e-320, just store -320, and use addition instead of multiplication)

answered Sep 13 '10 at 21:43

recursive

83,943
34
151
241

Hey! Your solution seems great. It's very straightforward and seems quite easy to try. Thanks! I will try it. – Pravel Sep 14 '10 at 02:11

score 7 · Answer 3 · answered Sep 13 '10 at 21:39

Floating point numbers don't have infinite precision, which is why you saw the numbers turn to 0. Could you multiply all the probabilities by a large scalar, so that your numbers stay in a higher range? If you're only worried about max and not magnitude, you don't even need to bother dividing through at the end. Alternatively you could use an infinite precision decimal, like ikanobori suggests.

supakeen · Answer 4 · 2010-09-13T21:41:59.883

5

Take a look at Decimal from the stdlib.

from decimal import Decimal, getcontext

getcontext().prec = 320

Decimal(1) / Decimal(7)

I am not posting the results here as it is quite long.

edited Sep 13 '10 at 21:41

answered Sep 13 '10 at 21:35

supakeen

2,876
19
19

In Python small floats tending to zero

4 Answers4

Linked