28

I'm trying to use the glmnet package on a dataset. I'm using cv.glmnet() to get a lambda value for glmnet(). I'm excluding columns 1,2,7,12 as they are: id column, response column, contain NA's, and contain NA's.

Here's the dataset and error message:

> head(t2)
  X1 X2        X3 X4 X5         X6    X7 X8 X9 X10 X11 X12
1  1  1 0.7661266 45  2 0.80298213  9120 13  0   6   0   2
2  2  0 0.9571510 40  0 0.12187620  2600  4  0   0   0   1
3  3  0 0.6581801 38  1 0.08511338  3042  2  1   0   0   0
4  4  0 0.2338098 30  0 0.03604968  3300  5  0   0   0   0
5  5  0 0.9072394 49  1 0.02492570 63588  7  0   1   0   0
6  6  0 0.2131787 74  0 0.37560697  3500  3  0   1   0   1
> str(t2)
'data.frame':   150000 obs. of  12 variables:
 $ X1 : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X2 : int  1 0 0 0 0 0 0 0 0 0 ...
 $ X3 : num  0.766 0.957 0.658 0.234 0.907 ...
 $ X4 : int  45 40 38 30 49 74 57 39 27 57 ...
 $ X5 : int  2 0 1 0 1 0 0 0 0 0 ...
 $ X6 : num  0.803 0.1219 0.0851 0.036 0.0249 ...
 $ X7 : int  9120 2600 3042 3300 63588 3500 NA 3500 NA 23684 ...
 $ X8 : int  13 4 2 5 7 3 8 8 2 9 ...
 $ X9 : int  0 0 1 0 0 0 0 0 0 0 ...
 $ X10: int  6 0 0 0 1 1 3 0 0 4 ...
 $ X11: int  0 0 0 0 0 0 0 0 0 0 ...
 $ X12: int  2 1 0 0 0 1 0 0 NA 2 ...

> cv1 <- cv.glmnet(as.matrix(t2[,-c(1,2,7,12)]), t2[,2], family="binomial")
Error in as.matrix(cbind2(1, newx) %*% nbeta) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': Error in t(.Call(Csparse_dense_crossprod, y, t(x))) : 
  error in evaluating the argument 'x' in selecting a method for function 't': Error: invalid class 'NA' to dup_mMatrix_as_dgeMatrix
> cv1 <- cv.glmnet(as.matrix(t2[,-c(1,2,7,12)]), t2[,2], family="multinomial")
Error in t(.Call(Csparse_dense_crossprod, y, t(x))) : 
  error in evaluating the argument 'x' in selecting a method for function 't': Error: invalid class 'NA' to dup_mMatrix_as_dgeMatrix

Any suggestions?

screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • 15
    Figured it out on my own. Instead of as.matrix() I needed to use: data.matrix(). – screechOwl Dec 10 '11 at 17:22
  • I'm not too familiar with this package, but it looks like you're supplying your binomial response on both sides of the equation... x=t[,c(1,2,7,12)] AND y=t[,2] ... if you notice your model looks too good to be true, this is probably why. – Brandon Bertelsen Dec 10 '11 at 17:25
  • Not sure if there's graphics error, but the input vector is x=t[,-c(1,2,7,12)]. The '-' in front of the c() means to exclude those columns and keep everything else, so the responses should only be on one side of the equation. – screechOwl Dec 10 '11 at 17:47
  • screechOwl, that's a perfectly valid answer, post your own answer, I'll upvote; that error message is really useless. I just hit this issue too with a matrix of categoricals. – smci May 17 '14 at 08:51
  • Plugging my own package here: [glmnetUtils](https://github.com/hong-revo/glmnetUtils) lets you use the formula+data.frame syntax to call glmnet, and should hopefully make problems like this moot. – Hong Ooi Oct 19 '16 at 07:07

3 Answers3

40

For some reason glmnet prefers data.matrix() to as.matrix()

cv1 <- cv.glmnet(data.matrix(t2[,-c(1,2,7,12)]), t2[,2], family="multinomial")

should do the job.

screechOwl
  • 27,310
  • 61
  • 158
  • 267
8

I got the same error Msg. and unfortunately it wasn't as easy as using data.matrix() for me.

The error occures in a crossproduct of the input matrix and the model coefficients.

In predict.glmnet:

nfit = as.matrix(cbind2(1, newx) %*% nbeta)

What solved the problem for me was forcing x to a dgCMatrix. I seriously don't understand why, but it works for me.

predict(object = lm, newx =  as(x, "dgCMatrix"), type = "response")

Since I haven't seen this as an answer in one of the many questions regarding that error I thought I'd post it here.

Fabricio
  • 148
  • 1
  • 11
  • 1
    Just to add to this for future people, I ran into the same problem and using as(x, "dgCMatrix") in combination with as.vector(classifications) seemed to do the trick. Example Code: cvfit = cv.glmnet(x= as(data, "dgCMatrix"), y = as.vector(classification), family = "binomial") – Peter Maguire Jan 23 '17 at 22:33
6

I had a similar error message when using cv.glmnet. cv.glmnet would function correctly in RStudio, but it would fail to work when using Rscript. The fix for me was to add to the top of my script:

require(methods)

It seems that the "glmnet" package may be using some functions from the "methods" package, but this package is not loaded at startup when using Rscript. However, the "methods" package is normally loaded by default in R.

Here is information about this Rscript functionality:

Rscript does not load methods package, R does -- why, and what are the consequences?

I hope this helps someone that runs into the same issue that I did.