I would like to compare pairs of samples with both Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests. I implemented this with scipy.stats.ks_2samp
and scipy.stats.anderson_ksamp
respectively. I would expect a low statistic for similar samples (0 for identical samples) and a higher statistic for more different samples.
In the case of identical samples, and very different samples (no overlap), ks_2samp
provides results as expected, while anderson_ksamp
provides negative values for identical samples and, more importantly, throws an error for very different samples (might be due to the sample size: 200 in the example below).
Here is the code illustrating these findings:
import scipy.stats as stats
import numpy as np
normal1 = np.random.normal(loc=0.0, scale=1.0, size=200)
normal2 = np.random.normal(loc=100, scale=1.0, size=200)
Using KS and AD on identical samples:
sstats.ks_2samp(normal1, normal1)
sstats.anderson_ksamp([normal1, normal1])
Returns respectively:
# Expected
Ks_2sampResult(statistic=0.0, pvalue=1.0)
# Not expected
Anderson_ksampResult(statistic=-1.3196852620954158, critical_values=array([ 0.325, 1.226, 1.961, 2.718, 3.752]), significance_level=1.4357209285296726)
And on the different samples:
sstats.ks_2samp(normal1, normal2)
sstats.anderson_ksamp([normal1, normal2])
Returns respectively:
# Expected
Ks_2sampResult(statistic=1.0, pvalue=1.4175052453413253e-89)
# Not expected
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-757-e3914aaf909c> in <module>()
----> 1 stats.anderson_ksamp([normal1, normal2])
/usr/lib/python3.5/site-packages/scipy/stats/morestats.py in anderson_ksamp(samples, midrank)
1694 warnings.warn("approximate p-value will be computed by extrapolation")
1695
-> 1696 p = math.exp(np.polyval(pf, A2))
1697 return Anderson_ksampResult(A2, critical, p)
1698
OverflowError: math range error