cv.glmnet vs glmnet results; gauging explanatory power

Question

When estimating a lasso model via the glmnet package, I am wondering whether it is better to: (a) pull coefficients / predictions / deviance straight from the cv.fit object procured from cv.glmnet, or (b) use the minimum lambda from cv.glmnet to re-run glmnet and pull these objects from the glmnet process. (Please be patient -- I have a feeling that this is documented, but I'm seeing examples/tutorials of both online, and no solid logic for going one way or the other.)

That is, for coefficients, I can run (a):

cvfit = cv.glmnet(x=xtrain, y=ytrain, alpha=1, type.measure = "mse", nfolds = 20)
coef.cv <- coef(cvfit, s = "lambda.min")

Or I can afterwards run (b):

fit = glmnet(x=xtrain, y=ytrain, alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

While these two processes select the same model variables, they do not produce identical coefficients. Similarly, I could predict via either of the following two processes:

prdct <- predict(fit,newx=xtest)
prdct.cv <- predict(cvfit, newx=xtest, s = "lambda.min")

And they predict similar but NOT identical vectors.

Last, I would have THOUGHT I could pull % deviance explained via either of the two methods:

percdev <- fit$dev.ratio
percdev.cv <- cvfit$glmnet.fit$dev.ratio[cvfit$cvm==mse.min.cereal]

But in fact, it is not possible to pull percdev.cv in this way, because if the lambda sequence used by cv.glmnet has less than 100 elements, the lengths of cvfit$glmnet.fit$dev.ratio and cvfit$cvm==mse.min.cereal don't match. So I'm not quite sure how to pull the minimum-lambda dev.ratio from cvfit$glmnet.fit.

So I guess I'm wondering which process is best, why, and how people normally pull the appropriate dev.ratio statistic. Thanks!

From `?glmnet`, doc for `lambda` says: _WARNING: use with care. Avoid supplying a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit._. So I guess this partly answers your question about which to use. — acylam, Nov 30 '17 at 21:34
Also see this: https://stackoverflow.com/questions/29311323/difference-between-glmnet-and-cv-glmnet-in-r — acylam, Nov 30 '17 at 21:37
Thanks. And yes, I understand that generally `glmnet` should be used with a (default or supplied) lambda sequence, but once such a sequence has been supplied to `cv.glmnet`, and an "optimal" lambda (`lambda.1se` or `lambda.min`) obtained, one would think that using that lambda would result in identical `glmnet` results as it did under `cv.glmnet`. Even if slower to calculate. Additionally, I do have a hunch that it's probably better to obtain coefficients and predictions from `cv.glmnet`, but I am not sure how to obtain `dev.ratio` from `cv.glmnet`. — Leah Bevis, Dec 01 '17 at 19:15

score 2 · Answer 1 · answered Jul 01 '20 at 23:03

As pointed out in the comments, it has to do with the lambda sequence provided, because if you look under the source code of cv.glmnet, it calls glmnet:::cv.glmnet.raw which in the first few lines, run glmnet() on the defined lambda values.

So we can use an example below:

library(mlbench)
data(BostonHousing)
data = BostonHousing
data$chas=as.numeric(data$chas)

cvfit = cv.glmnet(x=as.matrix(data[,-14]),y=data[,14])
coef.cv <- coef(cvfit, s = "lambda.min")

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

head(cbind(coef.cv,coef))
6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.86654225
crim         -0.09834634  -0.09869320
zn            0.04144161   0.04158829
indus         .            .         
chas          2.68518774   2.68163334
nox         -16.30664523 -16.35459059

They are slightly different, and if you provide the lambda sequence used in cv.glmnet:

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda)
coef <- coef(fit, s = cvfit$lambda.min)
head(cbind(coef.cv,coef))

6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.74123706
crim         -0.09834634  -0.09834634
zn            0.04144161   0.04144161
indus         .            .         
chas          2.68518774   2.68518774
nox         -16.30664523 -16.30664523

They are the same now. And the dev.ratio will match too:

fit$dev.ratio[fit$lambda==cvfit$lambda.min]
[1] 0.7401482
cvfit$glmnet.fit$dev.ratio[which.min(cvfit$cvm)]
[1] 0.7401482

cv.glmnet vs glmnet results; gauging explanatory power

1 Answers1