7

When i use the predict glmnet function, i get the error mentioned below the code.

mydata <- read.csv("data.csv")
x <- mydata[,1:4]
y <- mydata[,5]
data <- cbind(x,y)
model <- model.matrix(y~., data=data)
ridgedata <- model[,-1]
train <- sample(1:dim(ridgedata)[1], round(0.8*dim(ridgedata)[1]))
test <- setdiff(1:dim(ridgedata)[1],train)
x_train <- data[train, ]
y_train <- data$y[train]
x_test <- data[test, ]
y_test <- data$y[test]
k=5
grid =10^seq(10,-2, length =100)
fit <- cv.glmnet(model,y,k=k,lambda = grid)
lambda_min <- fit$lambda.min
fit_test <- predict(fit, newx=x_test,s=lambda_min)

The error is as follows:

Error in as.matrix(cbind2(1, newx) %*% nbeta) : error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error in cbind2(1, newx) %*% nbeta : not-yet-implemented method for <data.frame> %*% <dgCMatrix>

I tried debugging, but i am not sure where the

as.matrix(cbind2(1, newx) %*% nbeta)

code is being used and what is causing this error.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
RDPD
  • 555
  • 3
  • 8
  • 18
  • try `x_test <- as.matrix(data[test, ])` ? – Ben Bolker Feb 16 '16 at 16:13
  • @BenBolker Getting the same error – RDPD Feb 16 '16 at 16:15
  • 1
    OK, then we need a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) please ... – Ben Bolker Feb 16 '16 at 16:16
  • Can you post a version I don't have to request access for, i.e. completely open? Even better, can you create a small self-contained example that generates the same error and can just be posted here? – Ben Bolker Feb 16 '16 at 16:37

1 Answers1

13

Your original data frame has a factor (categorical) variable among the predictor variables. When you use model.matrix it does something sensible with this variable; if you just pass it directly to predict, it doesn't know what to do.

newX <- model.matrix(~.-y,data=x_test)
fit_test<-predict(fit, newx=newX,s=lambda_min)

By the way, you could have replicated this example with a minimal/made-up example, with just a few lines of data ... for example, this setup gives the same error (I called the data dd rather than "data", because the latter is a built-in function in R):

set.seed(101)
dd <- data.frame(y=rnorm(5),
            a=1:5,b=2:6,c=3:7,d=letters[1:5])
model <- model.matrix(y~., data=dd)
n <- nrow(dd)
train <- sample(1:n, size=round(0.8*n))
test <- setdiff(1:n,train)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That was very helpful indeed. Thank you! – RDPD Feb 16 '16 at 17:23
  • Thanks. Does this mean that I can't use categorical variables with cv.glmnet? I am running against the same issue, I am using nummerical variables except one (gender). Thankks in advance! – Emmanuel Goldstein Jun 16 '21 at 08:55
  • Another question: could you explain the meaning of "~."? [tilde followed by dot] – Emmanuel Goldstein Jun 16 '21 at 09:23
  • it means that you need to convert your categorical variables to dummy variables, which is most easily done using `model.matrix`. The formula `y~.` says to include all the variables in the data frame, except for the response variable, in the model matrix – Ben Bolker Jun 16 '21 at 14:26
  • Thanks, I appreciate it. However, I still don't understand since x does not contain the response variable. Aren't we dummifying only the predictor to dummies (and not the y)? Another question is that using glmnet one can use categorical variables, but not using cv.glmnet, which is strange (just to be sure, I am talking about the upper example, with the minus sign, which I don't know what is doing either). – Emmanuel Goldstein Jun 16 '21 at 19:23