7

I'm using ridge regression (ridgeCV). And I've imported it from: from sklearn.linear_model import LinearRegression, RidgeCV, LarsCV, Ridge, Lasso, LassoCV

How do I extract the p-values? I checked but ridge has no object called summary.

I couldn't find any page which discusses this for python (found one for R).

alphas = np.linspace(.00001, 2, 1)
rr_scaled = RidgeCV(alphas = alphas, cv =5, normalize = True)
rr_scaled.fit(X_train, Y_train)

1 Answers1

6

You can use the regressors package to output p values using:

from regressors import stats    
stats.coef_pval(rr_scaled, X_train, Y_train)

You can also print out a regression summary (containing std errors, t values, p values, R^2) using:

stats.summary(rr_scaled, X_train, Y_train)

Example:

df = pd.DataFrame({'y':np.random.randn(10), 'x1':np.random.randn(10), 'x2':np.random.randn(10)})
#           y        x1        x2
# 0 -0.228546  0.133703  0.624039
# 1 -1.005794  1.064283  1.527229
# 2 -2.180160 -1.485611 -0.471199
# 3 -0.683695 -0.213433 -0.692055
# 4 -0.451981 -0.133173  0.995683
# 5 -0.166878 -0.384913  0.255065
# 6  0.816602 -0.380910  0.381321
# 7 -0.408240  1.116328  1.163418
# 8 -0.899570 -1.055483 -0.470597
# 9  0.926600 -1.497506 -0.523385
X_train = df[['x1','x2']]
Y_train = df.y

alphas = np.linspace(.00001, 2, 1)
rr_scaled = RidgeCV(alphas = alphas, cv =5, normalize = True)
rr_scaled.fit(X_train, Y_train)

Calling stats.coef_pval:

stats.coef_pval(rr_scaled, X_train, Y_train)
# array([0.17324576, 0.77225007, 0.74614808])

Now, calling stats.summary:

stats.summary(rr_scaled, X_train, Y_train)
# Residuals:
# Min      1Q  Median      3Q     Max
# -1.3347 -0.2368  0.0038  0.3636  1.7804


# Coefficients:
#             Estimate  Std. Error  t value   p value
# _intercept -0.522607    0.353333  -1.4791  0.173246
# x1         -0.143694    0.481720  -0.2983  0.772250
# x2          0.192431    0.576419   0.3338  0.746148
# ---
# R-squared:  0.00822,    Adjusted R-squared:  -0.27515
# F-statistic: 0.03 on 2 features
Joe Patten
  • 1,664
  • 1
  • 9
  • 15
  • when I run the second bit, the error - "summary() missing 1 required positional argument: 'y'" props up. when i included in the y variable, I get the following error: "ValueError: all the input array dimensions except for the concatenation axis must match exactly" – Nimish Vaddiparti Jan 23 '19 at 10:04
  • @NimishVaddiparti I forgot to put `Y_train` in `stats.summary`, I have fixed it and added an example. – Joe Patten Jan 23 '19 at 15:54
  • - the number indepdent variables i have in the model are 32. But the p-value output shows 33 values. Any idea, why this is happening? I've set the fit_intercept to false – Nimish Vaddiparti Jan 25 '19 at 06:19
  • @NimishVaddiparti The extra value might be representing the intercept term in your linear regression formula. – pseudomonas Sep 04 '19 at 12:34
  • @JoePatten Do these p-values and coeffecients represent the values from the best model? – pseudomonas Sep 04 '19 at 12:35
  • @JoePatten I keep getting ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 10 what's happening? can you help me? – Dumb ML Jun 05 '20 at 11:47