Different p-value of logistic regression in SPSS and statsmodels

Question

I tried to do the univariate analysis (binary logistic regression, one feature each time) in Python with statsmodel to calculate the p-value for a different feature.

for f_col in f_cols:
    model = sm.Logit(y,df[f_col].astype(float))
    result = model.fit()
    features.append(str(result.pvalues).split('   ')[0])
    pvals.append(str(result.pvalues).split('   ')[1].split('\n')[0])

df_pvals = pd.DataFrame(list(zip(features, pvals)), 
           columns =['features', 'pvals']) 
df_pvals

However, the result in the SPSS is different. The p-value of NYHA in the sm.Logit method is 0. And all of the p-values are different.

Is it right to use sm.Logit in the statsmodel to do the binary logistic regression?
Why there is a difference between the results? Probably sm.Logit use L1 regularization?
How should I get the same?

Many thanks!

You may want to look at this answer https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression#42677750 — Paula Thomas, Jan 11 '20 at 22:23
`add_constant`, You are missing the constant that statsmodels doesn't add automatically when formulas are not used. — Josef, Jan 12 '20 at 00:41

score 0 · Answer 1 · answered Jan 15 '20 at 21:57

0

SPSS regression modeling procedures include constant or intercept terms automatically, unless they're told not to do so. As Josef mentions, statsmodels appears to require you to explicitly add an intercept.

answered Jan 15 '20 at 21:57

David Nichols

576
3
5

Different p-value of logistic regression in SPSS and statsmodels

1 Answers1