2

I'm creating a model with several thousand variables, all of which have a majority of values equal to NA. I am able to successfully run logistic regression on some variables but not others.

Here's my code to input the large amount of vars:

model_vars <- names(dataset[100:4000])
vars<- paste("DP ~ ", paste(model_vars, collapse= " + "))  

This formats it with the dependant variable and each Independant variable having a "+" between. I then run this through the glm function:

glm(vars, data = training, family = binomial)

Here is the error I get when certain variables are included:

Error in family$linkfun(mustart) : 
Argument mu must be a nonempty numeric vector

I cannot figure out why this is occuring and why the regression works for certain variables and not others. I can't see any trend in the variables that cause the error. Could someone clarify why this error shows up?

greeny
  • 425
  • 1
  • 6
  • 20
  • The first parameter to `glm()` should be a formula. What exactly are you passing? A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful in order to diagnose your problem. I'm not sure what you want to do with the NA values, but normally any row with an NA value in a column included in the model will be dropped. – MrFlick Feb 24 '16 at 05:58
  • Original question updated, with code for creating the independant variable list. I think r is dealing with the na values correctly since its worked already with certain vars with na's in them – greeny Feb 24 '16 at 06:18
  • Have you tried cutting down the number of variables to the minimal size where you can replicate this problem? Ideally you could get it to the size where you could post it in the question and make this a reproducible example. On Stack Overflow questions about non-working code are off topic and generally get closed unless they include a reproducible example (which in this case would be code and data that we can use to reproduce your problem). You can read more about reproducible examples for R questions at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – josliber Feb 24 '16 at 06:37
  • Thanks, will create a reproducible example – greeny Feb 24 '16 at 06:48

3 Answers3

6

For others with that cryptic error message. Perhaps the data frame is empty?

This reproduces the message:

d=data.frame(x=c(NA),y=c(NA))
d=d[complete.cases(d),]
m=glm(y~.,d,family = 'binomial')

Error in family$linkfun(mustart) : Argument mu must be a nonempty numeric vector

Chris
  • 1,219
  • 2
  • 11
  • 21
5

I had the error:

Error in family$linkfun(mustart) : 
  Argument mu must be a nonempty numeric vector

when using logistic regression with glm(), like:

glm(y~x,data=df, family='binomial')

after subsetting and standardizing data frames in a loop.

It turned out, that (some of) the subsetted and standardized data frames contained NA, which caused the error.

Peter
  • 2,120
  • 2
  • 19
  • 33
0

I also had this error, and the cause was an if_else function returning a logical vector, not a numeric variable as expected.

SophistM
  • 21
  • 3