1

From the scipy documentation on scipy.stats.kstest, it seems that the function only allows a comparison between a sample and a pre-defined probability distribution. Can it compare between a sample and a self-defined probability distribution?

I could use the two-sample Kolmogorov-Smirnov test, scipy.stats.ks_2samp, to compare a theoretical sample generated from the self-defined function to the actual sample.

I tried the following code:

from scipy.stats import kstest
sample = [1 for i in range(10)]
ks_stat, p_value = kstest(sample, lambda x: 1)
print ks_stat, p_value
>> 1.0, 0.0

The p_value above should give 1 as the sample matches exactly to the distribution.

Links for convenience

One sample KS test: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.kstest.html

Two sample KS test: https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.ks_2samp.html

Charles
  • 11
  • 2
  • 1
    The cdf in kstest can be a callable, so you should be able to provide your own cdf function or method. – Josef Mar 25 '17 at 20:29
  • 1
    This [answer](http://stackoverflow.com/a/17903318/1922650) might help in understanding how to define the callable that can be passed into the kstest. I am not sure about your example though. The distributions for the kstest must be _continuous_. You can e.g. run ```rv = scipy.stats.uniform(loc=1, scale=0.5); sample = rv.rvs(10); kstest(sample, rv.cdf)``` which will fail if you set the ```scale``` parameter to ```0```. – Niklas Mar 25 '17 at 23:03

1 Answers1

0

The KS test is only applicable to continuous distributions. One property of continuous distributions is that samples cannot have repeats.

With a sounder example the test works as expected:

import numpy as np
from scipy.stats import kstest
sample = [i for i in range(10)]
ks_stat, p_value = kstest(sample, lambda x: np.clip(0.1*(x+0.5),0,1))
print ks_stat, p_value

Prints:

0.05 1.0

Oops, @Niklas sorry only just now read your comment properly.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99