0

Apache Math 3.4 (and 3.3), java 1.8.0_25

import org.apache.commons.math3.distribution.ChiSquaredDistribution;
ChiSquaredDistribution chisq = new ChiSquaredDistribution(23)
System.out.println(1.0 - chisq.cumulativeProbability(130) //  1.1102230246251565E-16
System.out.println(1.0 - chisq.cumulativeProbability(131) //  0.0

Why does Apache Math return 0.0 in the second call? Some stat libraries (Excel, but not R) do return values that are much smaller than 1E-16 for the tail probabilities.

Additional Edit: In the comments below, Robert provides a direct way of calculating chi square tail probabilities using another function from Apache math library (regularizedGammaQ) that does not have this precision problem.

ChiSquaredDistribution Javadoc

N. Sriram
  • 3
  • 2

1 Answers1

0

Note that the smallest value which can be subtracted from 1.0 to yield something less than 1.0 is approximately 1e-16; you can verify this directly. Maybe you should print out chisq.cumulativeProbability(131) itself. I don't know if it's correct but in any case let's not confuse the issue by subtracting it from 1.0.

Robert Dodier
  • 16,905
  • 2
  • 31
  • 48
  • Thanks. You are right about the subtraction part for doubles in that 1 - 1e-16 is 0.999 etc but 1 - 1e-17 is 1 due to digits of precision in doubles. But chisq.cumulativeProbability(131) is exactly 1.0. I've got inputs from the apache math folks that this relates to computation of cdfs; stock functions in R and Octave also behave similarly (but Excel provides a small non-zero value, maybe it computes the tail directly and/or uses higher precision). Will update on this thread when I understand things better. – N. Sriram Jan 06 '15 at 15:48
  • @clarity Source code from commons-math 3.4 shows that ChiSquaredDistribution.cumulativeProbability is calculated by Gamma.regularizedGammaP. If I'm not mistaken (you'll want to verify this) it takes the branch x >= a + 1 at line 319, so it calculates the result as 1.0 - regularizedGammaQ(...). Because of the subtraction from 1.0, the result cannot be between 0.0 and the floating point epsilon 1e-16. – Robert Dodier Jan 06 '15 at 18:32
  • @clarity That's unfortunate, although I have to wonder if it really matters. If you are doing any kind of practical work, the errors involved in having the wrong model (which is almost certainly the case) are much, much greater. Is it really worth the trouble to figure out very small tail probabilities? That depends entirely on your application. – Robert Dodier Jan 06 '15 at 18:33
  • You are right. I was curious. One useful property is that it provides a 1-1 function from p value to statistic (so can recover statistic from p value for a given distribution). There was a fortran lib called dcdflib that did the tail (there is a C version with gotos) and a successor cleaner library called prob does not appear to have the tail function. Excel provides it. See http://people.sc.fsu.edu/~jburkardt/c_src/cdflib/cdflib.html and http://people.sc.fsu.edu/~jburkardt/c_src/prob/prob.html As a new member, I can't close this thread or upvote your answer. – N. Sriram Jan 07 '15 at 19:57
  • @clarity Well, if you really need the tail probability, you can calculate it yourself from `regularizedGammaQ`. If I am not mistaken, the tail value is `regularizedGammaQ(n/2, x/2)` where `n` is the degrees of freedom (23 in the example) and `x` is the left-hand boundary of the tail (131 in the example). Of course, you'll want to verify that formula. I haven't checked to see if `regularizedGammaQ` has any floating point evaluation problem. – Robert Dodier Jan 07 '15 at 22:52
  • Excellent. This works as follows. Gamma.regularizedGammaQ(11.5,65.5) Double = 4.199075982358542E-17 Could you recommend a book or online resource that explains how/why this function computes the tail in this case? – N. Sriram Jan 09 '15 at 00:33
  • This doesn't quite match the tail function in the case of 130. That is, Gamma.regularizedGammaQ(11.5,65) = 6.396904313600597E-17 which is distinct from chisq.cumulativeProbability(130) = 1.1102230246251565E-16 – N. Sriram Jan 09 '15 at 00:41
  • The regularized and the 1 - chisquare functions do agree exactly on bigger values. So maybe the regularized gammaQ is more accurate for these small values? They agree exactly till critical value of 100 and diverge after that. – N. Sriram Jan 09 '15 at 00:50
  • (1) about a reference, take a look at Abramowitz & Stegun, Handbook of Mathematical Functions. See also the Digital Library of Mathematical Functions, which is an on-line resource. (2) About the different values for 130, it is suspicious that chisq.cumulativeProbability(130) seems to be exactly equal to the floating point epsilon. (3) The reason I suggested going directly to regularizedGammaQ is that you can avoid a 1 - something subtraction which reduces the precision of the result when "something" is nearly 1. When "something" is not close to 1, the subtraction doesn't cause a problem. – Robert Dodier Jan 09 '15 at 01:08