Your problem is similar to the one reported here on the randomForest
classifier.
Apparently glm
checks through the variables in your data and throws an error because X contains only NA
values.
You can fix that error by
- either by dropping X completely from your dataset, setting
Cancer$X <- NULL
before handing it to glm
and leaving X
out in your formula (glm(diagnosis~.-id, data = Cancer, family = binomial)
);
- or by adding
na.action = na.pass
to the glm
call (which will instruct to ignore the NA-warning, essentially) but still excluding X in the formula itself (glm(diagnosis~.-id-X, data = Cancer, family = binomial, na.action = na.pass)
)
However, please note that still, you'd have to make sure to provide the diagnosis
variable in a form digestible by glm
. Meaning: either a numeric vector with values 0 and 1, a logical or a factor-vector
"For binomial and quasibinomial families the response can also be specified as a factor (when the first level denotes failure and all others success)" - from the glm
-doc
Just define Cancer$diagnosis <- as.factor(Cancer$diagnosis)
.
On my end, this still leaves some warnings, but I think those are coming from the data or your feature selection. It clears the blocking errors :)