24

I am running a regression with 67 observasions and 32 variables. I am doing variable selection using cv.glmnet function from the glmnet package. There is one variable I want to force into the model. (It is dropped during normal procedure.) How can I specify this condition in cv.glmnet?

Thank you!

My code looks like the following:

glmntfit <- cv.glmnet(mydata[,-1], mydata[,1])
coef(glmntfit, s=glmntfit$lambda.1se)

And the variable I want is mydata[,2].

lareven
  • 379
  • 2
  • 15

1 Answers1

30

This can be achieved by providing a penalty.factor vector, as described in ?glmnet. A penalty factor of 0 indicates that the "variable is always included in the model", while 1 is the default.

glmntfit <- cv.glmnet(mydata[,-1], mydata[, 1], 
                      penalty.factor=c(0, rep(1, ncol(mydata) - 2)))
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • 8
    `penalty.factor = (names(mydata)[1:...] == 'VAR_TO_PENALIZE')` would be a more elegant way to pick out that variable. – smci Jun 17 '15 at 00:30
  • Is there a possibility that even after setting penalty.factor of a variable to 0, it is still not forced into the model ? – melwin_jose Jul 08 '17 at 03:06
  • @melwin_jose It's possible. For example, you centered y and you have a constant column in your X, then coef of that columns should be zero. But in most of the time it won't be zero. – lovetl2002 Apr 26 '23 at 07:03