-1

Trying to perform feature selection in R. I am using the glmnet package to do this. Here is my code so far:

lasso_model = glmnet(as.matrix(x = lasso, y = lasso_target, 
standardize=TRUE, alpha=1))

lasso is a dataframe full of numeric and categoric predictors. The first column is the target variable which I have dropped.

lasso_target is the target variable that I dropped stored as its own dataframe.

Error:

Error in drop(y) : argument "y" is missing, with no default

My goal is to remove uninformative features from my dataframe before feeding it into my model. Any help would be greatly appreciated!

rmahesh
  • 739
  • 2
  • 14
  • 30
  • 3
    Your statement `as.matrix(x = lasso` seems incorrect May be you need `glmnet(x = as.matrix(lasso), y = as.matrix(lasso_target), ..` assuming that `lasso` and `lasso_target` are two objects. The error is because you have wrapped `as.matrix` with all the parameters – akrun Sep 08 '18 at 21:47
  • 3
    You can check the `?glmnet` where example is showed `x=matrix(rnorm(100*20),100,20); y=rnorm(100); fit1=glmnet(x,y)` – akrun Sep 08 '18 at 21:48

1 Answers1

1

Your close! But the input and response variables need to be defined separately. What your doing is combining them both into one matrix (in addition to other args for glmnet) and passing the whole thing to the function. As x is the first argument by default, it assumes that is the input matrix and then cannot find the responsible variable because the y argument has not been defined. Thus, you receive an error that tells you so.

This should do the trick:

lasso_model <-  glmnet(x = as.matrix(lasso), 
                       y = as.matrix(lasso_target), 
                       standardize=TRUE, 
                       alpha=1)
Justin
  • 1,360
  • 12
  • 15
  • This seems to work thank you! My follow up question would be regarding how to interpret the results to identify which column(s) are the least informative? I see the %dev and lambda as the columns. – rmahesh Sep 08 '18 at 22:23
  • `coef(lasso_model, s = cv.glmnet(x=as.matrix(lasso), y=lasso_target)$lambda.min)` will return weights of the coefficients that minimizes the models mean cross-validated error. See https://stackoverflow.com/questions/30565457/getting-glmnet-coefficients-at-best-lambda – Justin Sep 08 '18 at 23:30
  • Thank you so much for the response! I am getting this error when running that: Error in cbind2(1, newx) %*% nbeta : invalid class 'NA' to dup_mMatrix_as_dgeMatrix – rmahesh Sep 08 '18 at 23:35
  • Try this: `coef(lasso_model, s=cv.glmnet(x=as.matrix(lasso), y=as.matrix(lasso_target))$lambda.min)` – Justin Sep 08 '18 at 23:38
  • 2
    @rmahesh please avoid asking follow-up questions that have little to do with the OP. If you have a new issue, you can always open a new question. Also, since the answer resolved your issue, kindly accept it. – desertnaut Sep 08 '18 at 23:38
  • If that doesn't work, please mark this answer as complete, read the SO post I mentioned, and perhaps start a new thread as this is evolving far beyond the scope of the original question. Thank you. – Justin Sep 08 '18 at 23:40
  • 1
    @Justin Thanks for the response again, but it doesn't seem to be working. I think I am going to research other options. Thank you for all your time nonetheless. If I have any further question, I will post another question. – rmahesh Sep 08 '18 at 23:41
  • @desertnaut Absolutely sorry about that. – rmahesh Sep 08 '18 at 23:41