Math overflow error in scipy Anderson-Darling test for k-samples

Question

I would like to compare pairs of samples with both Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests. I implemented this with scipy.stats.ks_2samp and scipy.stats.anderson_ksamp respectively. I would expect a low statistic for similar samples (0 for identical samples) and a higher statistic for more different samples.

In the case of identical samples, and very different samples (no overlap), ks_2samp provides results as expected, while anderson_ksamp provides negative values for identical samples and, more importantly, throws an error for very different samples (might be due to the sample size: 200 in the example below).

Here is the code illustrating these findings:

import scipy.stats as stats
import numpy as np
normal1 = np.random.normal(loc=0.0, scale=1.0, size=200)
normal2 = np.random.normal(loc=100, scale=1.0, size=200)

Using KS and AD on identical samples:

sstats.ks_2samp(normal1, normal1)
sstats.anderson_ksamp([normal1, normal1])

Returns respectively:

# Expected
Ks_2sampResult(statistic=0.0, pvalue=1.0) 
# Not expected
Anderson_ksampResult(statistic=-1.3196852620954158, critical_values=array([ 0.325,  1.226,  1.961,  2.718,  3.752]), significance_level=1.4357209285296726)

And on the different samples:

sstats.ks_2samp(normal1, normal2)
sstats.anderson_ksamp([normal1, normal2])

Returns respectively:

# Expected
Ks_2sampResult(statistic=1.0, pvalue=1.4175052453413253e-89)
# Not expected
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-757-e3914aaf909c> in <module>()
----> 1 stats.anderson_ksamp([normal1, normal2])

/usr/lib/python3.5/site-packages/scipy/stats/morestats.py in anderson_ksamp(samples, midrank)
   1694         warnings.warn("approximate p-value will be computed by extrapolation")
   1695 
-> 1696     p = math.exp(np.polyval(pf, A2))
   1697     return Anderson_ksampResult(A2, critical, p)
   1698 

OverflowError: math range error

ely · Accepted Answer · 2018-03-06T18:22:56.723

I think these two things actually make some sense. The significance level or p-value in the Anderson-Darling test is extrapolated based on where the test statistic falls within the range of critical values. The further to the right that the test statistic falls, the more significantly you can reject the null hypothesis that they are from the same distribution.

Note that for, say, 80-90 samples using your example distribution parameters, you see the test statistic (for normal1 vs. normal2) starts to be hugely larger than the largest critical value, which means the extrapolation of the significance is free to grow (hugely, as the exponential of a convex-up quadratic function from polyfit) towards infinity. So yes, for a large sample size, you'll be computing the exponential of some huge number and getting overflow. In other words, your data is so obviously not from the same distribution, that the significance extrapolation overflows. In such a case, you might bootstrap a smaller data set from your actual data, just to avoid overflow (or bootstrap several times and average the statistic).

On the other end of the spectrum, when the sorted data sets are identical, it looks like some steps of the formula admits the possibility of negative values. Essentially this means the statistic is far to the left of the critical values, indicating a perfect match.

Once again, the significance is calculated by extrapolation, but this time it extrapolates from the test statistic towards the smallest critical value, rather than going from the largest critical value towards the test statistic as for the mismatching case. Since the relative size of the statistic on the left just happens to be smaller (I'm seeing statistics of around -1.3 for using the same sample) relative to the smallest critical value (around 0.3), you get an extrapolation that is "merely" as huge as around 140%, instead of exploding exponentially large numbers ... but still seeing a significance value of 1.4 is a signal that the data is just falling outside the scope where the test can be relevant.

Most likely this is because of the linked line above where k - 1 "degrees of freedom" are subtracted from the calculated test statistic. In the two sample case, this means subtracting 1. So if we add 1 back to the test statistics you're seeing, it puts you in the range of 0.31, which is almost exactly equal to the lowest critical value (which is what you would expect for perfectly identical data, meaning you cannot reject the null hypothesis at even the weakest significance level). So it's probably the degree of freedom adjustment that puts it into the negative end of the spectrum, and then it gets magnified by the hacky quadratic-based p-value extrapolation.

This is a beautiful answer @ely! Would you have any suggestion on what would be the most efficient/elegant way to solve this issue? You proposed a bootstrap analysis but to make it robust I believe i would need to use several bootstraps which would be quite a hindrance. What about the negative value? Just set it to 0 if negative? Maybe best to create a github issue? — michael, Mar 07 '18 at 11:30
For the negative value, you could add `k - 1` back to the value (`k` will be the number of separate samples passed to the test), and perform the extrapolation w.r.t. the critical values yourself. You would need to use several bootstraps of smaller sample sizes, but this is a relatively inexpensive computation, so I think even doing 50 or 100 repeats of the calculation on smaller data is fine. Be sure you are also aggregating the critical values themselves across bootstraps too (averaging will *probably* be fine, but you should check if you need some type of order statistic instead). — ely, Mar 07 '18 at 13:20
I don't know what version of `scipy` the OP was using, but in version `1.1.0` the AD test does not crash, instead it returns: `Anderson_ksampResult(statistic=202.74793118968645, critical_values=array([0.325, 1.226, 1.961, 2.718, 3.752]), significance_level=inf)`. In version `1.2.0` this [seems to have been modified](https://stats.stackexchange.com/questions/225588/why-does-scipy-stats-anderson-ksamp-give-a-p-value-of-over-a-million-for-these-d#comment693894_225588) to return a maximum p-value of 1. — Gabriel, Nov 06 '18 at 18:52

Math overflow error in scipy Anderson-Darling test for k-samples

1 Answers1