I tried to do the univariate analysis (binary logistic regression, one feature each time) in Python with statsmodel to calculate the p-value for a different feature.
for f_col in f_cols:
model = sm.Logit(y,df[f_col].astype(float))
result = model.fit()
features.append(str(result.pvalues).split(' ')[0])
pvals.append(str(result.pvalues).split(' ')[1].split('\n')[0])
df_pvals = pd.DataFrame(list(zip(features, pvals)),
columns =['features', 'pvals'])
df_pvals
However, the result in the SPSS is different. The p-value of NYHA
in the sm.Logit
method is 0. And all of the p-values are different.
- Is it right to use
sm.Logit
in the statsmodel to do the binary logistic regression? - Why there is a difference between the results? Probably
sm.Logit
use L1 regularization? - How should I get the same?
Many thanks!