When estimating a lasso model via the glmnet package, I am wondering whether it is better to: (a) pull coefficients / predictions / deviance straight from the cv.fit object procured from cv.glmnet
, or (b) use the minimum lambda from cv.glmnet
to re-run glmnet
and pull these objects from the glmnet
process. (Please be patient -- I have a feeling that this is documented, but I'm seeing examples/tutorials of both online, and no solid logic for going one way or the other.)
That is, for coefficients, I can run (a):
cvfit = cv.glmnet(x=xtrain, y=ytrain, alpha=1, type.measure = "mse", nfolds = 20)
coef.cv <- coef(cvfit, s = "lambda.min")
Or I can afterwards run (b):
fit = glmnet(x=xtrain, y=ytrain, alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")
While these two processes select the same model variables, they do not produce identical coefficients. Similarly, I could predict via either of the following two processes:
prdct <- predict(fit,newx=xtest)
prdct.cv <- predict(cvfit, newx=xtest, s = "lambda.min")
And they predict similar but NOT identical vectors.
Last, I would have THOUGHT I could pull % deviance explained via either of the two methods:
percdev <- fit$dev.ratio
percdev.cv <- cvfit$glmnet.fit$dev.ratio[cvfit$cvm==mse.min.cereal]
But in fact, it is not possible to pull percdev.cv
in this way, because if the lambda sequence used by cv.glmnet has less than 100 elements, the lengths of cvfit$glmnet.fit$dev.ratio
and cvfit$cvm==mse.min.cereal
don't match. So I'm not quite sure how to pull the minimum-lambda dev.ratio from cvfit$glmnet.fit
.
So I guess I'm wondering which process is best, why, and how people normally pull the appropriate dev.ratio statistic. Thanks!