I'm trying to use the knn function (from the class package) on my dataset. It has 5 columns of features, and the 6th is what I want to be able to predict. I'm doing a 70/30 split.
Here's my code:
> ind <- createDataPartition(CSD$Caesarian, p=0.70, list=FALSE)
> csd_train <- CSD[ ind,]
> csd_test <- CSD[-ind,]
> c1 <- CSD[1:6,-c(1,2,3,4,5)]
> knn(train, test, c1, k=2, prob=TRUE)
But I'm getting this error.
Error in knn(train, test, c1, k = 2, prob = TRUE) :
'train' and 'class' have different lengths
I looked at other threads and trying their suggested solutions (KNN in R: 'train and class have different lengths'?)
and tried the following, but I'm still getting errors
> c1 = as.factor(c1)
> dim(csd_train)
[1] 57 6
> dim(csd_test)
[1] 23 6
> length(c1)
[1] 6
> knn(train, test, c1, k=2, prob=TRUE)
Error in knn(train, test, c1, k = 2, prob = TRUE) :
'train' and 'class' have different lengths
I also tried this, and still getting an error.
> c1 = as.factor(CSD[['Caesarian']])
> knn(train, test, c1, k=2, prob=TRUE)
Error in knn(train, test, c1, k = 2, prob = TRUE) :
'train' and 'class' have different lengths
I'm lost as to how to fix this.
Here's a sample of my data if that helps:
> dput(head(CSD))
structure(list(Age = c(22L, 26L, 26L, 28L, 22L, 26L), Delivery.NO = c(1L,
2L, 2L, 1L, 2L, 1L), Delivery.NO.1 = c(1L, 1L, 0L, 1L, 1L, 0L
), BP = c(2L, 1L, 1L, 2L, 1L, 0L), Heart.Problem = c(1L, 1L,
1L, 1L, 1L, 1L), Caesarian = structure(c(1L, 2L, 1L, 1L, 2L,
1L), .Label = c("N", "Y"), class = "factor")), .Names = c("Age",
"Delivery.NO", "Delivery.NO.1", "BP", "Heart.Problem", "Caesarian"
), row.names = c(NA, 6L), class = "data.frame")
EDIT I did
c1 <- csd_train[, 6]
and the length(c1) is now 57, which is good. However, when I run the knn line, I'm now getting this new error:
Error in knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NA/NaN/Inf in `foreign function call (arg 6) In addition: Warning messages: 1: In` `knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NAs introduced by coercion 2:` `In knn(csd_train, csd_test, c1, k = 2, prob = TRUE) : NAs introduced by coercion`
All of my predictor variables are numeric, and there are no missing values.