I wrote a function glmnetPath
to do this. It initially appeared in my answer to a more recent Stack Overflow Q & A (sorry, I discovered this Q & A later), but was then shipped in an R package on GitHub.
## you may need to first install package "remotes" from CRAN
remotes::install_github("ZheyuanLi/solzy")
## Zheyuan Li's R functions on Stack Overflow
library(solzy)
The function works for models fitted by glmnet
and cv.glmnet
.
It is computationally efficient. It is fully vectorized without using loops. It also does not convert sparse coefficient matrix to a dense matrix for processing.
It provides a very informative summary of a coefficient path. Here is a reproducible example for demonstration.
library(glmnet)
library(solzy)
set.seed(42)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmnet(x, y)
ans <- glmnetPath(fit)
## Try this if you want. It also works!!
## cvfit <- cv.glmnet(x, y)
## glmnetPath(cvfit)
The resulting ans
is a list of 3 data frames: enter
, leave
and ignore
.
Interpretation of enter
As lambda
decreases, variables (see i
for numeric ID and var
for variable names) enter the model in turn (see ord
for the order). The corresponding lambda
for the event is fit$lambda[j]
.
ans$enter
# i j ord var lambda
#1 2 2 1 V2 0.16479873
#2 4 2 1 V4 0.16479873
#3 13 4 2 V13 0.13681880
#4 9 8 3 V9 0.09430389
#5 11 9 4 V11 0.08592619
#6 15 9 4 V15 0.08592619
#7 7 11 5 V7 0.07133744
#8 18 11 5 V18 0.07133744
#9 20 12 6 V20 0.06500001
#10 5 13 7 V5 0.05922559
#11 16 13 7 V16 0.05922559
#12 17 14 8 V17 0.05396415
#13 6 16 9 V6 0.04480199
#14 3 17 10 V3 0.04082190
#15 10 18 11 V10 0.03719540
#16 19 19 12 V19 0.03389106
#17 1 22 13 V1 0.02563735
#18 12 26 14 V12 0.01767083
#19 14 26 14 V14 0.01767083
Interpretation of leave
As lambda
increases, variables (see i
for numeric ID and var
for variable names) leave the model in turn (see ord
for the order). The corresponding lambda
for the event is fit$lambda[j]
.
ans$leave
# i j ord var lambda
#1 14 25 1 V14 0.01939371
#2 12 25 1 V12 0.01939371
#3 1 21 2 V1 0.02813695
#4 19 18 3 V19 0.03719540
#5 10 17 4 V10 0.04082190
#6 3 16 5 V3 0.04480199
#7 6 15 6 V6 0.04917013
#8 17 13 7 V17 0.05922559
#9 16 12 8 V16 0.06500001
#10 5 12 8 V5 0.06500001
#11 20 11 9 V20 0.07133744
#12 18 10 10 V18 0.07829275
#13 7 10 10 V7 0.07829275
#14 15 8 11 V15 0.09430389
#15 11 8 11 V11 0.09430389
#16 9 7 12 V9 0.10349840
#17 13 3 13 V13 0.15015846
#18 4 1 14 V4 0.18086640
#19 2 1 14 V2 0.18086640
Interpretation of ignored
If non-empty, it lists variables that never enter the model. That is, they are effectively ignored. Yes, this can happen!
ans$ignored
# i var
#1 8 V8
Notes:
fit$lambda
is decreasing, so j
is in ascending order in enter
but in descending order in leave
.
Several variables can enter or leave the model at the same time! The ord
column is very informative on this. For example, variable 2 and 4 enter the model simultaneously.
Finally, let me attach the path plots produced by glmnet:
plot(fit, label = TRUE)

plot(fit, xvar = "lambda", label = TRUE)
