23

I am using following code with glmnet:

> library(glmnet)
> fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1])
> plot(fit, xvar='lambda')

enter image description here

However, I want to print out the coefficients at best Lambda, like it is done in ridge regression. I see following structure of fit:

> str(fit)
List of 12
 $ a0       : Named num [1:79] 20.1 21.6 23.2 24.7 26 ...
  ..- attr(*, "names")= chr [1:79] "s0" "s1" "s2" "s3" ...
 $ beta     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:561] 0 4 0 4 0 4 0 4 0 4 ...
  .. ..@ p       : int [1:80] 0 0 2 4 6 8 10 12 14 16 ...
  .. ..@ Dim     : int [1:2] 10 79
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:10] "cyl" "disp" "hp" "drat" ...
  .. .. ..$ : chr [1:79] "s0" "s1" "s2" "s3" ...
  .. ..@ x       : num [1:561] -0.0119 -0.4578 -0.1448 -0.7006 -0.2659 ...
  .. ..@ factors : list()
 $ df       : int [1:79] 0 2 2 2 2 2 2 2 2 3 ...
 $ dim      : int [1:2] 10 79
 $ lambda   : num [1:79] 5.15 4.69 4.27 3.89 3.55 ...
 $ dev.ratio: num [1:79] 0 0.129 0.248 0.347 0.429 ...
 $ nulldev  : num 1126
 $ npasses  : int 1226
 $ jerr     : int 0
 $ offset   : logi FALSE
 $ call     : language glmnet(x = as.matrix(mtcars[-1]), y = mtcars[, 1])
 $ nobs     : int 32
 - attr(*, "class")= chr [1:2] "elnet" "glmnet"

But I am not able to get the best Lambda and the corresponding coefficients. Thanks for your help.

newacct
  • 119,665
  • 29
  • 163
  • 224
rnso
  • 23,686
  • 25
  • 112
  • 234
  • 1
    @smci , you have a typo in your example. The sign should be on the from parameter e.g. `lambda = 10^seq(from=-10, to=15, by=1/3)` – Faris Sep 17 '17 at 03:55
  • @smci do you have a citation for the advice? I can't find anything stating not to use the default lambda sequence. Although I understand why it might be good to supply a user-specified one, I was hoping for a source. – AW27 Feb 18 '21 at 09:15

3 Answers3

22

Try this:

fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1], 
    lambda=cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.1se)
coef(fit)

Or you can specify a specify a lambda value in coef:

fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1])
coef(fit, s = cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.1se)

You need to pick a "best" lambda, and lambda.1se is a reasonable, or justifiable, one to pick. But you could use cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.min or any other value of lambda that you settle upon as "best" for you.

Jota
  • 17,281
  • 7
  • 63
  • 93
  • log of lambda.min from cv.glmnet comes to -0.5. Is it all right if I mark this point on the x-axis of the plot(fit) from glmnet above? The log lambda indicated on x-axis of that plot is from the same vector from where lambda.min has come? – rnso Jun 01 '15 at 04:51
  • 1
    The log lambda on the x-axis is from the same vector of lambda values that lambda.min came from. Just be aware that due to the nature of cross validation, you can get different values for `lambda.min` if you run `cv.glmnet` again. So, your mark on the x-axis would be the `lambda.min` from a particular call of `cv.glmnet`. – Jota Jun 01 '15 at 05:05
  • Will it be correct to say that if (width=lambda.1se - lambda.min), then (lambda.min +/- 1.96*width) will be the 95% confidence intervals for lambda.min or the area where the true regression model is likely to lie? – rnso Jun 01 '15 at 05:28
  • I'm no expert, so call me out if I get something wrong here, but I don't think it would be right to say that there is a "true" regression model. The lambda.min value is the lambda that produces the lowest CV error, and the lambda.1se is the lambda giving a more regularized (i.e. more shrinkage) model that still is within 1 se of the minimum error. Anyway, have a look at `plot(cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1]))` to see that you're not dealing with a normal distribution. – Jota Jun 01 '15 at 06:05
  • 1
    One thing to note, as Frank says there will be some (or a lot) of variation in the minimum lambda if you rerun the cross-validation. `?cv.glmnet` prompts with *` Note also that the results of cv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glmnet many times, and averaging the error curves.`*. I rerun the the cv 100 times and average the curves and then find the minimum of this average curve (or 1se if you prefer). – user2957945 Jun 01 '15 at 16:39
  • Please change it to use lambda.1se instead of lambda.min – smci Jan 31 '17 at 22:42
  • @smci well, since you said "please." Thought it was fine as it was, since I showed using both lambda.min and lambda.1se and mentioned you have to pick what you consider "best." – Jota Feb 01 '17 at 02:12
  • @user2957945 et al: so what's the best practice, say we fit for 10 different random seeds? When getting the 'optimal' coefs, do we use the lambda.1se of the 'majority', or the individual lambda.1se of each fit? (e.g. in order to study variance in coeffts across multiple runs, for given alpha value?) – smci Feb 02 '17 at 03:52
  • 1
    @smci Maybe asking on Cross Validated is a good idea? I would say your idea sounds reasonable, but I would also say "best practice" may depend on your goals. You going for parsimony? predictive power? feature selection? – Jota Feb 02 '17 at 16:11
  • 1
    @smci, what i do is use one random seed: but use an outer loop so that the cv is run multiple (N) times. This produces N lambda by mse curves . Then I average the N ms'se across the curves at each lambda . Then find the lambda that minimises this averaged mse. – user2957945 Feb 28 '17 at 10:42
  • @Jota: good points. I didn't expect the answer would differ too much for each of those: for predictive power, use each individual lambda.1se, for parsimony, use the lambda.1se nearest to the mean or median (like @user2957945); for feature selection, can you please tell me what you've seen recommended? – smci Feb 28 '17 at 22:55
4

To extract the optimal lambda, you could type fit$lambda.min

To obtain the coefficients corresponding to the optimal lambda, use coef(fit, s = fit$lambda.min) - please reference p.6 of the Glmnet vignette.

I think the coefficients are produced by a model fitted to the full data, not just testing sets, as mentioned in this page.

Jwjw
  • 49
  • 2
  • Welcome to SO! We can go beyond just answering the question at times by including a recommendation about how the asker can do even better. In this case, consider pointing the asker toward the 'glmnet' vignette or the `cv.glmfit` function which would help them find a value of lambda that will generalize better. – rcorty Feb 03 '20 at 21:39
  • Hey, if you do fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]), there is no fit$lambda.min. You only get that by calling cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1]) – StupidWolf Feb 17 '20 at 22:32
  • Also, if you read accepted answer above, lambda.min can be used, but lambda.1se is something commonly used because you try to choose a more parsimonious model. https://stats.stackexchange.com/questions/138569/why-is-lambda-within-one-standard-error-from-the-minimum-is-a-recommended-valu – StupidWolf Feb 17 '20 at 22:42
-2

boxcox(){MASS} provides a maximum-likelihood plot showing which value of l provides the best fit in a linear model

boxcox(lm.fit) provides the maximum-likelihood plot for a wide range of l’s in the linear model

lm.fit pick the l with the highest ML value

boxcox(lm.fit,lambda=seq(-0.1, 0.1, 0.01)) if, for example, the highest l is around 0.04, get a zoomed in plot around that area

In the example, the function provides a plot between l =- 0.1 and 0.1 in 0.01 increments.

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135