I'm creating a model with several thousand variables, all of which have a majority of values equal to NA. I am able to successfully run logistic regression on some variables but not others.
Here's my code to input the large amount of vars:
model_vars <- names(dataset[100:4000])
vars<- paste("DP ~ ", paste(model_vars, collapse= " + "))
This formats it with the dependant variable and each Independant variable having a "+" between. I then run this through the glm function:
glm(vars, data = training, family = binomial)
Here is the error I get when certain variables are included:
Error in family$linkfun(mustart) :
Argument mu must be a nonempty numeric vector
I cannot figure out why this is occuring and why the regression works for certain variables and not others. I can't see any trend in the variables that cause the error. Could someone clarify why this error shows up?