Python: Chi 2 test produces wrong results (chi2_contingency)

Question

I am trying to calculate the Chi square value in python, using a contingency table. Here is an example.

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |   80 |  120 |
| Group2 |  420 |  380 |
+--------+------+------+

The expected values are:

+--------+------+------+
|        | Cat1 | Cat2 |
+--------+------+------+
| Group1 |  100 |  100 |
| Group2 |  400 |  400 |
+--------+------+------+

If I calculate the Chi square value by hand I get 10. With python however I get 9.506. I use the following code:

import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy

# Some fake data.
n = 5  # Number of samples.
d = 3  # Dimensionality.
c = 2  # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])

# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])

contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380

# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)

It is weird that the function gives me the correct expected values, however the Chi square and p-value are off. What am I doing wrong here?

Thanks

p.s.

I am aware that I create the initial table in pandas is pretty lame, but I am not an expert on how to create these nested tables in pandas.

score 8 · Accepted Answer · answered Aug 03 '17 at 14:24

From the documentation:

correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.

And degrees of freedom is 1. Is you set correction to False, you'll get 10.

chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100.,  100.],
    [ 400.,  400.]]))

Thank you for the quick help. Will mark correct in 6 min! – valenzio Aug 03 '17 at 14:28 — valenzio, Aug 03 '17 at 14:28

Python: Chi 2 test produces wrong results (chi2_contingency)

1 Answers1