I'd like to run a chi-squared test in Python. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse.
Background first: I have two groups of users. My null hypothesis is that there is no significant difference in whether people in either group are more likely to use desktop, mobile, or tablet.
These are the observed frequencies in the two groups:
[[u'desktop', 14452], [u'mobile', 4073], [u'tablet', 4287]]
[[u'desktop', 30864], [u'mobile', 11439], [u'tablet', 9887]]
Here is my code using scipy.stats.chi2_contingency
:
obs = np.array([[14452, 4073, 4287], [30864, 11439, 9887]])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print p
This gives me a p-value of 2.02258737401e-38
, which clearly is significant.
My question is: does this code look valid? In particular, I'm not sure whether I should be using scipy.stats.chi2_contingency
or scipy.stats.chisquare
, given the data I have.