0

I am trying to run the code glmnet(M, R, family="binomial"), where M is a data matrix of size (N by k) and R is a N-dimensional vector of binary values. N is the number of samples and k is the number of variables. In my specific case, R is simply a vector of ones because it is the only response I have for my dataset.

As noted in some other answers, when R contains all ones or all zeroes, glmnet throws the following error:

"Error in y %*% rep(1, nc) : non-conformable arguments"

Why is this the case, and is there a way to circumvent this error? glm does not throw this error but severely overfits for my dataset, so I need to use glmnet for regularization purposes.

I can provide sample code if needed.

Thanks.

wogsland
  • 9,106
  • 19
  • 57
  • 93
Shoogiebaba
  • 81
  • 10
  • What is the point of building a model with only one possible response? Plus, this is not a programming question. – m-dz Jan 25 '17 at 22:56
  • @m-dz Actually the response _can_ be 0 or 1. This is in the missing data context. So, imagine that there is a true distribution of the data. And then there is another distribution dependent on the "true" data that determines the probability of a given sample being missing. Now, given this other distribution, we can get a new modified dataset with missing datapoints. In this new dataset, we are only looking at datapoints where response was "1", i.e. datapoints that are not missing. – Shoogiebaba Jan 25 '17 at 23:15
  • @m-dz It is a programming question because I just want to allow `glmnet` to do its magic regardless of whether the responses are all 1 or not. I asked this on another part of stack and they said it was strictly a programming question... – Shoogiebaba Jan 25 '17 at 23:19
  • 1
    I agree that this is a programming question, but it seems like the modeling issue raised by @m-dz still applies. If all of your training data has an outcome value of 1, the model will predict a 100% probability that the outcome is 1 for any other data you feed it, including data for which the outcome value is 0. – eipi10 Jan 25 '17 at 23:34
  • @Shoogiebaba, your model should then be built on the original data. When you are predicting whether a treatment will cure an illness you are not feeding only those examples where it did. This still looks like an experiment design question to me, no programming involved. – m-dz Jan 26 '17 at 13:06
  • @m-dz This is for understanding and improving inference with missing data. The data I am working with is artificially generated. So, in the real world there are some contexts where part of the data is missing e.g. when people drop out of a study. By definition, we do not have access to the missing data. The purposes of what my experiments is to better understand how we can perform inference in such situations and come up with algorithms to improve our inference. – Shoogiebaba Jan 26 '17 at 23:19
  • @m-dz Anyway, this is strictly a programming question. Rather than argue about the purpose, I'd just like to know if there is any way to allow `glmnet` to be run in this situation. For instance, `glm` is able to do this but it does not provide a regularization penalty. Apparently, `glmnet` does provide a regularization penalty, and I'd like to make use of that. – Shoogiebaba Jan 26 '17 at 23:21
  • @Shoogiebaba, please follow the guideline here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – m-dz Jan 27 '17 at 11:10

0 Answers0