6

I am working on the data where I am trying to see the association between two variables and I used Chi-Square analysis in Scipy package in Python.

Here is the crosstab result of the two variables:

pd.crosstab(data['loan_default'],data['id_proofs'])

Result:

   id_proofs    2   3   4   5
  loan_default              
    0   167035  15232   273 3
    1   46354   4202    54  1

If I apply the Chi-Square on the same data, I see an error saying ValueError: The internally computed table of expected frequencies has a zero element at (0,).

Code:

from scipy.stats import chi2_contingency
stat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs'])
print(stat,p,dof,expec)

Error Report:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-154-63c6f49aec48> in <module>()
      1 from scipy.stats import chi2_contingency
----> 2 stat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs'])
      3 print(stat,p,dof,expec)

~/anaconda3/lib/python3.6/site-packages/scipy/stats/contingency.py in chi2_contingency(observed, correction, lambda_)
    251         zeropos = list(zip(*np.where(expected == 0)))[0]
    252         raise ValueError("The internally computed table of expected "
--> 253                          "frequencies has a zero element at %s." % (zeropos,))
    254 
    255     # The degrees of freedom

ValueError: The internally computed table of expected frequencies has a zero element at (0,).

What could be the reasons for the issue? How can I overcome this?

Jack Daniel
  • 2,527
  • 3
  • 31
  • 52

1 Answers1

6

Take another look at the docstring for chi2_contingency. The first argument, observed, must be the contingency table. You must compute the contingency table (like you did with pd.crosstab(data['loan_default'],data['id_proofs'])) and pass it to chi2_contingency.

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214