I am attempting to run a classification algorithm for a dataset with no missing values. Here is the dataset description:
'data.frame': 59977 obs. of 6 variables:
$ gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 1 2 2 ...
$ age : num 35.7 35.7 35.7 35.7 35.7 ...
$ code : Factor w/ 492 levels "ADN105","AXN16B",..: 128 128 128 363 363 363 104 104 221 221 ...
$ totalflags : num 4 4 4 4 4 4 3 3 2 2 ...
$ measure2 : num 30 30 30 1 1 1 23 23 22 22 ...
$ outcome : num 1 1 1 0 0 0 1 1 1 1 ...
- attr(*, "na.action")=Class 'omit' Named int [1:138] 3718 3719 5493 5494 5495 5496 7302 7303 8415 8416 ...
.. ..- attr(*, "names")= chr [1:138] "4929" "4930" "7384" "7385" ...
When I run the following command
x <- Mydataset[,1:5]
y <- Mydataset[,6]
fit <- glmnet(x, y, family="binomial", alpha=0.5, lambda=0.001)
I get
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NA/NaN/Inf in foreign function call (arg 5)
In addition: Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
NAs introduced by coercion
Before running the glm model, I did this:
Mydataset <- na.omit(Mydataset)
And checked to make sure no NA's exist:
sapply(Mydataset, function(y) sum(length(which(is.na(y)))))
and I got:
gender age code totalflags measure2 outcome
0 0 0 0 0 0
I looked at other questions for couldn't find anything relevant. Appreciate any thoughts and help in this
EDIT: ANSWER
I did a little digging and decided to change the data frame to numeric matrix and the model ran without complaining. This is the code that helped me:
x <- data.matrix(Mydataset[,1:5])
y <- data.matrix(Mydataset[,6])