Accuracy with very small probabilities

Question

I writing a program in Java which requires me to compute some probabilities, and for larger inputs, the probabilities can eventually become very small. Therefore, to prevent underflow issues, I would like to take the log probabilities instead.

I am, however, having trouble implementing this. At each stage of computation there can be a different number of options, to which probabilities need to be assigned, and each stage they need to add up to 1. The probabilities are based on a number of different variables. I take a sum over all possibilities using the following formula:

Math.pow(d[i], a) * Math.pow(1/c[i], b)

This gives me a variable, total. To then establish the probability p_i,

p_i = (Math.pow(d[i], a) * Math.pow(1/c[i], b)) / total

My question is, how can I implement this using log probabilities, so that I do not get 'Infinity' and 'NaN' values, since these are what I have been getting so far.

So, is that to say that implementing log probabilities here is not possible? — swingballchamp42, Aug 02 '17 at 17:48
You might need to switch to use BigDecimal, since it allows arbitrary precision. — azurefrog, Aug 02 '17 at 17:48
My inclination is that you will have to use `BigDecimal`, but unfortunately, I think you'll have to [roll your own](https://stackoverflow.com/questions/739532/logarithm-of-a-bigdecimal) logarithm function with `BigDecimal`. Amazing that doesn't come built-in... — juanpa.arrivillaga, Aug 02 '17 at 17:48
Well, maybe its not what you need; maybe consider adding a large multiplicative factor to overcome underflow — meowgoesthedog, Aug 02 '17 at 17:48
Thanks for responses. In response to the BigDecimal suggestions, would using BigDecimal and specifying a certain precision essentially stop underflow errors? I was considering adding a large multiplicative factor also... I'll give that a go before switching to BigDecimal — swingballchamp42, Aug 02 '17 at 17:53

score 0 · Accepted Answer · answered Aug 03 '17 at 01:00

What I think you should try is to use Kahan Summation. It will allow to sum properly not loosing precision.

In some C-like pseudo-code (sorry, my Java is rusty, code is untested)

double total(int N, double[] d, double[] c, double a, double b) {

    double sum           = 0.0;
    double running_error = 0.0;

    for (int i = 0; i != N; ++i) {
        if (d[i] == 0.0)
            continue;

        if (c[i] == 0.0)
            throw "XXX"; // some error reporting

        double v = 0.0;
        if (d[i] > 0.0 && c[i] > 0.0) {
            // using log trick, if you want
            double lpi = a*Math.log(d[i]) - b*Math.log(c[i]);
            v = Math.exp(lpi);   
        }
        else {
            v = Math.pow(d[i], a) * Math.pow(1.0/c[i], b);
        }

        double difference = v - running_error;
        double temp = sum + difference;
        running_error = (temp - sum) - difference;
        sum = temp;
     }
     return sum;
}

Accuracy with very small probabilities

1 Answers1