16

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code:

# de <- data imported from sql connection        
x <- model.matrix(~.,data = de[,2:7])
y <- (de[,1])
reg <- cv.glmnet(x,y, family = "poisson", alpha = 1)
reg1 <- glmnet(x,y, family = "poisson", alpha = 1)

**Co <- coef(?reg or reg1?,s=???)**

summ <- summary(Co)
c <- data.frame(Name= rownames(Co)[summ$i],
       Lambda= summ$x)
c2 <- c[with(c, order(-Lambda)), ]

The beginning imports a large amount of data from my database in SQL. I then put it in matrix format and separate the response from the predictors.

This is where I'm confused: I can't figure out exactly what the difference is between the glmnet() function and the cv.glmnet() function. I realize that the cv.glmnet() function is a k-fold cross-validation of glmnet(), but what exactly does that mean in practical terms? They provide the same value for lambda, but I want to make sure I'm not missing something important about the difference between the two.

I'm also unclear as to why it runs fine when I specify alpha=1 (supposedly the default), but not if I leave it out?

Thanks in advance!

smci
  • 32,567
  • 20
  • 113
  • 146
Sean Branchaw
  • 597
  • 1
  • 5
  • 21
  • Try looking at `plot(reg)`. – Roland Mar 28 '15 at 09:46
  • 3
    **Never rely on glmnet's default lambda sequence!** Notorious issue. Always provide your own sequence. Then get the optimal lambda value afterwards from `fit$lambda.min` and use it with the `s=lambda.min` parameter in all calls to `predict()`, `coef()` etc. – smci Jun 16 '15 at 22:32
  • 2
    @smci why not using lambda.1se? Exactly this one is used by predict() – Alina Mar 27 '16 at 00:45
  • @Tonja: yes, you can use lambda.1se or lambda.min, depending. The important point is that implicitly using the default lambda sequence (as the OP did) is total garbage and produces garbage results. – smci Mar 28 '16 at 16:47
  • 2
    Could you please tell some details why not to use the predefined lambda and how to choose better sequence? – pikachu Jul 24 '20 at 13:05
  • 3
    @smci Could you substantiate your claims about the default lambda sequence being garbage? Apart from my belief, that the authors of glmnet knew what they were doing, the sequence goes from a max lambda, for which all coefficients are guaranteed to be zero, to a very small one where usually all coefficients enter the model (depending of course on the shape of your matrix), which makes a lot of sense IMO. And in my cases it worked perfectly. Is there some class of models where it does not? – Elmar Zander May 26 '21 at 12:55
  • @smci, Your statement lacks scientific rigor. In the context of regularization, finding the sequence of lambdas is a well-defined problem. The original authors of this package even have a published paper on this topic. See reference [https://scholar.google.com/citations?view_op=view_citation&hl=en&user=ZpG_cJwAAAAJ&citation_for_view=ZpG_cJwAAAAJ:MSzX15-gZgkC](Regularization paths for generalized linear models via coordinate descent). – Fox Fairy Mar 09 '23 at 16:56
  • @FoxFairy: that URL doesn't work, please post a working one? – smci Mar 09 '23 at 19:57
  • @smci Jerome Friedman, Trevor Hastie, and Rob Tibshirani "Regularization Paths for Generalized Linear Models via Coordinate Descent" [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880]. I can access the paper through my institution, not sure it works for non-subscriber. But you can try (scihub). – Fox Fairy Mar 09 '23 at 20:12

2 Answers2

22

glmnet() is a R package which can be used to fit Regression models,lasso model and others. Alpha argument determines what type of model is fit. When alpha=0, Ridge Model is fit and if alpha=1, a lasso model is fit.

cv.glmnet() performs cross-validation, by default 10-fold which can be adjusted using nfolds. A 10-fold CV will randomly divide your observations into 10 non-overlapping groups/folds of approx equal size. The first fold will be used for validation set and the model is fit on 9 folds. Bias Variance advantages is usually the motivation behind using such model validation methods. In the case of lasso and ridge models, CV helps choose the value of the tuning parameter lambda.

In your example, you can do plot(reg) OR reg$lambda.min to see the value of lambda which results in the smallest CV error. You can then derive the Test MSE for that value of lambda. By default, glmnet() will perform Ridge or Lasso regression for an automatically selected range of lambda which may not give the lowest test MSE. Hope this helps!

Hope this helps!

Amrita Sawant
  • 10,403
  • 4
  • 22
  • 26
  • 1
    More explicitly stated: **Never rely on glmnet's default lambda sequence! Always supply your own sequence.** – smci Jun 16 '15 at 22:33
  • 1
    If I understand correctly, both `cv.glmnet` and `glmnet` optimize lambda. `cv.glmnet` uses cross validation whereas `glmnet` simply relies on the cost function. Is that correct? – Jeff Bezos Dec 04 '21 at 16:33
  • @smci You're quite adamant that the default lamda sequence should not be used and that one should supply one's own sequence. Do you have any sources or evidence for the first point? If so, could you point to a rule of thumb or empirical method for determining the sequence of lambdas most appropriate to your data? – Calen Nov 03 '22 at 15:56
  • @Calen,no, "I'm" not quite adamant, every PhD ML professor, teacher and specialist I've dealt with is. And for a very obvious reason, namely that it might miss the optimal lambda value. You could use a two-step strategy: first try a coarse range of lambda (2 values/decade) from very high to very low (logarithmically), then zoom in on the optimal value and doa second run on a closer range with say 3/4/5 points per decade. This is what I used to do. Why a more robust default lambda was never implemented in glmnet package I don't know. I'd much rather see testcases and patches than debate. – smci Nov 04 '22 at 16:11
  • @smci Thanks, I'll try that. I'm trying to find a good model right now and am not finding a ton of good resources for the actual implementation, so any additional sources or advice are welcome. – Calen Nov 05 '22 at 18:38
2

Between reg$lambda.min and reg$lambda.1se ; the lambda.min obviously will give you the lowest MSE, however, depending on how flexible you can be with the error, you may want to choose reg$lambda.1se, as this value would further shrink the number of predictors. You may also choose the mean of reg$lambda.min and reg$lambda.1se as your lambda value.

OSK
  • 596
  • 1
  • 7
  • 23