0

I tried to do the univariate analysis (binary logistic regression, one feature each time) in Python with statsmodel to calculate the p-value for a different feature.

for f_col in f_cols:
    model = sm.Logit(y,df[f_col].astype(float))
    result = model.fit()
    features.append(str(result.pvalues).split('   ')[0])
    pvals.append(str(result.pvalues).split('   ')[1].split('\n')[0])

df_pvals = pd.DataFrame(list(zip(features, pvals)), 
           columns =['features', 'pvals']) 
df_pvals

However, the result in the SPSS is different. The p-value of NYHA in the sm.Logit method is 0. And all of the p-values are different. enter image description here

  1. Is it right to use sm.Logit in the statsmodel to do the binary logistic regression?
  2. Why there is a difference between the results? Probably sm.Logit use L1 regularization?
  3. How should I get the same?

Many thanks!

Jo_
  • 525
  • 1
  • 5
  • 10
  • You may want to look at this answer https://stackoverflow.com/questions/27928275/find-p-value-significance-in-scikit-learn-linearregression#42677750 – Paula Thomas Jan 11 '20 at 22:23
  • 2
    `add_constant`, You are missing the constant that statsmodels doesn't add automatically when formulas are not used. – Josef Jan 12 '20 at 00:41

1 Answers1

0

SPSS regression modeling procedures include constant or intercept terms automatically, unless they're told not to do so. As Josef mentions, statsmodels appears to require you to explicitly add an intercept.

David Nichols
  • 576
  • 3
  • 5