How to conduct hypothesis testing in Python?

Question

I would ideally like to find the P value. I come from more of a statistics background and am fairly new to Python. Are there any packages that will allow me to do this? I am following the "Data Science from Scratch" book and am sort of stuck on Hypothesis Testing and Inference.

maybe the [SciPy package](http://docs.scipy.org/doc/) can do it, there's a page on [chi-square](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html#scipy-stats-chisquare) — chickity china chinese chicken, May 26 '17 at 17:08
[How to calculate p-value for two lists of floats?](https://stackoverflow.com/questions/29561360/how-to-calculate-p-value-for-two-lists-of-floats), [Python p-value from t-statistic](https://stackoverflow.com/questions/17559897/python-p-value-from-t-statistic) may also be helpful/related — chickity china chinese chicken, May 26 '17 at 17:25

score 3 · Accepted Answer · answered May 28 '17 at 13:45

3

SciPy package has a whole module with lots of statistical stuff, including hypothesis tests and build-in distribution functions: scipy.stats

For example, this is how you can test if a random sample is normally distributed using the Kolmogorov-Smirnov test:

import numpy as np
from scipy.stats import norm, pareto, kstest

n = 1000
sample_norm = norm.rvs(size=1000)  # generate normally distributed random sample
sample_pareto = pareto.rvs(1.0, size=1000)  # sample from some other distribution for comparison

d_norm, p_norm = kstest(sample_norm, norm.cdf)  # test if the sample_norm is distributed normally (correct hypothesis)
d_pareto, p_pareto = kstest(sample_pareto, norm.cdf)  # test if the sample_pareto is distributed normally (false hypothesis)

print('Statistic values: %.4f, %.4f' % (d_norm, d_pareto))
print('P-values: %.4f, %.4f' % (p_norm, p_pareto))

As you can see kstest returns the value of the statistic and the p-value. norm.cdf stands for the cumulative distribution function of a normal random variable.

answered May 28 '17 at 13:45

Slippy

485
6
11

This is more of what I was looking for thanks! Just a couple follow up questions. First, in the code you provided, where are you describing the null and alternative hypothesis? And are you giving a value to those values? Are the values 'sample_norm' and 'sample_pareto' just random values? – rmahesh May 29 '17 at 22:24
`sample_norm` and `sample_pareto` are basically just arrays of numbers that are sampled from the normal distribution and from the Pareto distribution, respectively. In the example I just test the null hypothesis "sample_norm is distributed normally" against an alternative "sample_norm is NOT distributed normally" by calling the `kstest` function with the given 2 arguments, than I do the same thing for `sample_pareto`. So as you can see It's not like the hypotheses themselves are defines somewhere in the code, but they are implied by the code instead :) – Slippy May 30 '17 at 13:38
Perfect thank you so much! I have been looking for a way to do this and get the P-value, and this seems to be it! – rmahesh May 30 '17 at 23:09

How to conduct hypothesis testing in Python?

1 Answers1