Why R glmnet predict gives a matrix instead of just one column?

Question

I would like to fit a logistic regression with ridge regularization. Here is my code

library(modeldata)
library(glmnet)

# check the data
data(attrition)
head(attrition)

# split the data into training 80%, and test 20%
smp_size <- floor(0.8 * nrow(attrition))

## set the seed to make your partition reproducible
set.seed(123)

# randomly get the index for training data
train_ind <- sample(seq_len(nrow(attrition)), size = smp_size)

# get training and testing data
train <- attrition[train_ind, ]
test <- attrition[-train_ind, ]


# fit the model
X <- model.matrix(Attrition~ ., train)
lm_ridge <- glmnet(X, train$Attrition, family = 'binomial', alpha = 0)


# get predicted values based on ridge regularization
prob_ridge <- predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response')

The prob_ridge gives a matrix of 294 * 100. But I am expecting just one column, 294*1. Anything wrong with my code? Why am I getting a matrix from the predict function?

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Aug 07 '21 at 04:21

score 0 · Accepted Answer · answered Aug 09 '21 at 01:47

For glmnet, a series of lambdas are fitted, so you get coefficients for each lambda and also predictions for each lambda. As documented in the vignette:

If multiple values of s are supplied, a matrix of predictions is produced. If no value of s is supplied, a matrix of predictions is supplied, with columns corresponding to all the lambdas used in the fit.

So in your case, your lambda values are:

head(lm_ridge$lambda,50)
 [1] 84.7169444 77.1909245 70.3334955 64.0852617 58.3921036 53.2047101
 [7] 48.4781503 44.1714850 40.2474120 36.6719429 33.4141086 30.4456913
[13] 27.7409800 25.2765478 23.0310489 20.9850340 19.1207814 17.4221439
[19] 15.8744087 14.4641699 13.1792130 12.0084080 10.9416141  9.9695913
[25]  9.0839203  8.2769298  7.5416302  6.8716526  6.2611939  5.7049667
[31]  5.1981532  4.7363636  4.3155981  3.9322122  3.5828853  3.2645917
[37]  2.9745744  2.7103214  2.4695439  2.2501564  2.0502587  1.8681194
[43]  1.7021608  1.5509455  1.4131638  1.2876222  1.1732334  1.0690066
[49]  0.9740390  0.8875081

If you choose lambda (s = 0.8875081), then you get 1 column:

pred = predict(lm_ridge, model.matrix(Attrition~ ., test), type = 'response',
s = 0.8875081)
dim(pred)
[1] 294   1

If you want to know the optional lambda, you can follow the example in the vignette (mentioned above) and use a cross-validation approach with cv.glmnet, for example:

cvfit = cv.glmnet(X, train$Attrition, family = 'binomial', alpha = 0)
pred = predict(cvfit, model.matrix(Attrition~ ., test), type = 'response')

dim(pred)
[1] 294   1

By default it chooses:

“lambda.1se”: the largest at which the MSE is within one standard error of the smallest MSE (default).

Just occurred to me that any reason why the default is not lambda.min, but lambda.1se? — ycenycute, Sep 03 '21 at 14:02
@tracyyang, See https://stats.stackexchange.com/questions/138569/why-is-lambda-within-one-standard-error-from-the-minimum-is-a-recommended-valu — StupidWolf, Sep 04 '21 at 09:59

Why R glmnet predict gives a matrix instead of just one column?

1 Answers1